library(glmmTMB) #for model fitting
library(car) #for regression diagnostics
library(broom) #for tidy output
library(ggfortify) #for model diagnostics
library(DHARMa) #for residual diagnostics
library(see) #for plotting residuals
library(knitr) #for kable
library(effects) #for partial effects plots
library(ggeffects) #for partial effects plots
library(emmeans) #for estimating marginal means
library(modelr) #for auxillary modelling functions
library(tidyverse) #for data wrangling
library(lindia) #for diagnostics of lm and glm
library(performance) #for residuals diagnostics
library(sjPlot) #for outputs
library(report) #for reporting methods/results
library(easystats) #framework for stats, modelling and visualisationGLM Example 1
1 Preparations
Load the necessary libraries
2 Scenario
Here is an example from Fowler, Cohen, and Jarvis (1998). An agriculturalist was interested in the effects of fertiliser load on the yield of grass. Grass seed was sown uniformly over an area and different quantities of commercial fertiliser were applied to each of ten 1 m² randomly located plots. Two months later the grass from each plot was harvested, dried and weighed. The data are in the file fertilizer.csv in the data folder.
| FERTILIZER | YIELD |
|---|---|
| 25 | 84 |
| 50 | 80 |
| 75 | 90 |
| 100 | 154 |
| 125 | 148 |
| ... | ... |
| FERTILIZER: | Mass of fertiliser (g/m²) - Predictor variable |
| YIELD: | Yield of grass (g/m²) - Response variable |
The aim of the analysis is to investigate the relationship between fertiliser concentration and grass yield.
3 Read in the data
We will start off by reading in the Fertiliser data. There are many functions in R that can read in a CSV file. We will use a the read_csv() function as it is part of the tidyverse ecosystem.
Rows: 10 Columns: 2
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (2): FERTILIZER, YIELD
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
After reading in a dataset, it is always a good idea to quickly explore a few summaries in order to ascertain whether the imported data are correctly transcribed. In particular, we should pay attention to whether there are any unexpected missing values and ensure that each variable (column) has the expected class (e.g. that variables we expected to be considered numbers are indeed listed as either <dbl> or <int> and not <char>).
spc_tbl_ [10 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ FERTILIZER: num [1:10] 25 50 75 100 125 150 175 200 225 250
$ YIELD : num [1:10] 84 80 90 154 148 169 206 244 212 248
- attr(*, "spec")=
.. cols(
.. FERTILIZER = col_double(),
.. YIELD = col_double()
.. )
- attr(*, "problems")=<externalptr>
fert (10 rows and 2 variables, 2 shown)
ID | Name | Type | Missings | Values | N
---+------------+---------+----------+-----------+---
1 | FERTILIZER | numeric | 0 (0.0%) | [25, 250] | 10
---+------------+---------+----------+-----------+---
2 | YIELD | numeric | 0 (0.0%) | [80, 248] | 10
-----------------------------------------------------
4 Exploratory data analysis
The individual responses (\(y_i\), observed yields) are each expected to have been independently drawn from normal (Gaussian) distributions (\(\mathcal{N}\)). These distributions represent all the possible yields we could have obtained at the specific (\(i^{th}\)) level of Fertiliser. Hence the \(i^{th}\) yield observation is expected to have been drawn from a normal distribution with a mean of \(\mu_i\).
Although each distribution is expected to come from populations that differ in their means, we assume that all of these distributions have the same variance (\(\sigma^2\)).
Model formula: \[ \begin{align} y_i &\sim{} \mathcal{N}(\mu_i, \sigma^2)\\ \mu_i &= \beta_0 + \beta_1 x_i \end{align} \]
where \(y_i\) represents the \(i\) observed values, \(\beta_0\) and \(\beta_1\) represent the intercept and slope respectively, and \(\sigma^2\) represents the estimated variance.
We intend to explore the relationship between Yield and Fertiliser using simple linear regression. Such an analysis assumes:
- Normality: each population is assumed to be normally distributed. This can be initially assessed by exploring the distribution of the response via either boxplots, violin plots or histograms (depending on sample size).
- Homogeneity of variance: each population is assumed to be equally varied. This can be initially assessed by exploring the distribution of observed values around an imaginary line of best fit through the cloud of data formed by a scatterplot of Yield against Fertiliser. If the spread of points increases (or decreases) along the trend line, it is likely that the populations are not equally varied.
- Linearity: in fitting a straight line through the cloud of data, we are assuming that a linear trend is appropriate. If the trend is not linear, then the inferred relationship may be miss-representative of the true relationship. In addition to an imaginary line, we could fit either a loess or linear smoother through the cloud to help us assess the likelihood of non-linearity.
- Independence: to help ensure that the estimates are unbiased, we must assume that all observations are independent. In this case, we assume that the observations were collected from a randomised design in which the level of Fertiliser was applied randomly to the different patches of the field. If however, the researchers had simply moved along the field applying progressively higher Fertiliser concentrations, then there is a chance that they have introduced additional biases or confounding factors. Similarly, if the Fertiliser levels were applied in increasing doses over time, there might also be biases. In the absence of any spatial or temporal information associated with the collection of data, we cannot directly assess this assumption.
Conclusions:
- there is no evidence of non-normality, non-homogeneity of variance or non-linearity
- we have no initial reason to suspect that the assumptions will not be satisfied and thus it is reasonable to assume that the results will be reliable.
5 Fit the model
The lm() function returns a list comprising the estimated coefficients, the residuals, the data used to fit the model and various other attributes associated with the model fitting procedure.
$names
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
$class
[1] "lm"
List of 12
$ coefficients : Named num [1:2] 51.933 0.811
..- attr(*, "names")= chr [1:2] "(Intercept)" "FERTILIZER"
$ residuals : Named num [1:10] 11.78 -12.5 -22.79 20.93 -5.36 ...
..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
$ effects : Named num [1:10] -517.03 184.25 -23.73 18.66 -8.96 ...
..- attr(*, "names")= chr [1:10] "(Intercept)" "FERTILIZER" "" "" ...
$ rank : int 2
$ fitted.values: Named num [1:10] 72.2 92.5 112.8 133.1 153.4 ...
..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
$ assign : int [1:2] 0 1
$ qr :List of 5
..$ qr : num [1:10, 1:2] -3.162 0.316 0.316 0.316 0.316 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:10] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:2] "(Intercept)" "FERTILIZER"
.. ..- attr(*, "assign")= int [1:2] 0 1
..$ qraux: num [1:2] 1.32 1.27
..$ pivot: int [1:2] 1 2
..$ tol : num 1e-07
..$ rank : int 2
..- attr(*, "class")= chr "qr"
$ df.residual : int 8
$ xlevels : Named list()
$ call : language lm(formula = YIELD ~ FERTILIZER, data = fert)
$ terms :Classes 'terms', 'formula' language YIELD ~ FERTILIZER
.. ..- attr(*, "variables")= language list(YIELD, FERTILIZER)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "YIELD" "FERTILIZER"
.. .. .. ..$ : chr "FERTILIZER"
.. ..- attr(*, "term.labels")= chr "FERTILIZER"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(YIELD, FERTILIZER)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "YIELD" "FERTILIZER"
$ model :'data.frame': 10 obs. of 2 variables:
..$ YIELD : num [1:10] 84 80 90 154 148 169 206 244 212 248
..$ FERTILIZER: num [1:10] 25 50 75 100 125 150 175 200 225 250
..- attr(*, "terms")=Classes 'terms', 'formula' language YIELD ~ FERTILIZER
.. .. ..- attr(*, "variables")= language list(YIELD, FERTILIZER)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "YIELD" "FERTILIZER"
.. .. .. .. ..$ : chr "FERTILIZER"
.. .. ..- attr(*, "term.labels")= chr "FERTILIZER"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(YIELD, FERTILIZER)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "YIELD" "FERTILIZER"
- attr(*, "class")= chr "lm"
YIELD FERTILIZER
1 84 25
2 80 50
3 90 75
4 154 100
5 148 125
6 169 150
7 206 175
8 244 200
9 212 225
10 248 250
(Intercept) FERTILIZER
51.9333333 0.8113939
Rather than directly access these attributes (which may have different names in different modelling functions), there are extraction functions that reach into the model output and extract the information in a convenient way.
(Intercept) FERTILIZER
51.9333333 0.8113939
1 2 3 4 5 6 7 8
72.21818 92.50303 112.78788 133.07273 153.35758 173.64242 193.92727 214.21212
9 10
234.49697 254.78182
1 2 3 4 5 6 7
11.781818 -12.503030 -22.787879 20.927273 -5.357576 -4.642424 12.072727
8 9 10
29.787879 -22.496970 -6.781818
function (object, type = c("working", "response", "deviance",
"pearson", "partial"), ...)
NULL
The lm() function returns a list comprising the estimated coefficients, the residuals, the data used to fit the model and various other attributes associated with the model fitting procedure.
$names
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
$class
[1] "lm"
List of 12
$ coefficients : Named num [1:2] 163.5 0.811
..- attr(*, "names")= chr [1:2] "(Intercept)" "center(FERTILIZER)"
$ residuals : Named num [1:10] 11.78 -12.5 -22.79 20.93 -5.36 ...
..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
$ effects : Named num [1:10] -517.03 184.25 -23.73 18.66 -8.96 ...
..- attr(*, "names")= chr [1:10] "(Intercept)" "center(FERTILIZER)" "" "" ...
$ rank : int 2
$ fitted.values: Named num [1:10] 72.2 92.5 112.8 133.1 153.4 ...
..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
$ assign : int [1:2] 0 1
$ qr :List of 5
..$ qr : num [1:10, 1:2] -3.162 0.316 0.316 0.316 0.316 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:10] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:2] "(Intercept)" "center(FERTILIZER)"
.. ..- attr(*, "assign")= int [1:2] 0 1
..$ qraux: num [1:2] 1.32 1.27
..$ pivot: int [1:2] 1 2
..$ tol : num 1e-07
..$ rank : int 2
..- attr(*, "class")= chr "qr"
$ df.residual : int 8
$ xlevels : Named list()
$ call : language lm(formula = YIELD ~ center(FERTILIZER), data = fert)
$ terms :Classes 'terms', 'formula' language YIELD ~ center(FERTILIZER)
.. ..- attr(*, "variables")= language list(YIELD, center(FERTILIZER))
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "YIELD" "center(FERTILIZER)"
.. .. .. ..$ : chr "center(FERTILIZER)"
.. ..- attr(*, "term.labels")= chr "center(FERTILIZER)"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(YIELD, center(FERTILIZER, center = 137.5, verbose = FALSE))
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "YIELD" "center(FERTILIZER)"
$ model :'data.frame': 10 obs. of 2 variables:
..$ YIELD : num [1:10] 84 80 90 154 148 169 206 244 212 248
..$ center(FERTILIZER): 'dw_transformer' num [1:10] -112.5 -87.5 -62.5 -37.5 -12.5 ...
.. ..- attr(*, "center")= num 138
.. ..- attr(*, "scale")= num 1
.. ..- attr(*, "robust")= logi FALSE
..- attr(*, "terms")=Classes 'terms', 'formula' language YIELD ~ center(FERTILIZER)
.. .. ..- attr(*, "variables")= language list(YIELD, center(FERTILIZER))
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "YIELD" "center(FERTILIZER)"
.. .. .. .. ..$ : chr "center(FERTILIZER)"
.. .. ..- attr(*, "term.labels")= chr "center(FERTILIZER)"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(YIELD, center(FERTILIZER, center = 137.5, verbose = FALSE))
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "YIELD" "center(FERTILIZER)"
- attr(*, "class")= chr "lm"
YIELD center(FERTILIZER)
1 84 -112.5
2 80 -87.5
3 90 -62.5
4 154 -37.5
5 148 -12.5
6 169 12.5
7 206 37.5
8 244 62.5
9 212 87.5
10 248 112.5
(Intercept) center(FERTILIZER)
163.5000000 0.8113939
Rather than directly access these attributes (which may have different names in different modelling functions), there are extraction functions that reach into the model output and extract the information in a convenient way.
(Intercept) center(FERTILIZER)
163.5000000 0.8113939
1 2 3 4 5 6 7 8
72.21818 92.50303 112.78788 133.07273 153.35758 173.64242 193.92727 214.21212
9 10
234.49697 254.78182
1 2 3 4 5 6 7
11.781818 -12.503030 -22.787879 20.927273 -5.357576 -4.642424 12.072727
8 9 10
29.787879 -22.496970 -6.781818
function (object, type = c("working", "response", "deviance",
"pearson", "partial"), ...)
NULL
The glmmTMB() function returns a list comprising the estimated coefficients, the residuals, the data used to fit the model and various other attributes associated with the model fitting procedure.
$names
[1] "obj" "fit" "sdr" "call" "frame" "modelInfo"
[7] "fitted"
$class
[1] "glmmTMB"
List of 7
$ obj :List of 10
..$ par : Named num [1:3] 1 1 0
.. ..- attr(*, "names")= chr [1:3] "beta" "beta" "betad"
..$ fn :function (x = last.par[lfixed()], ...)
..$ gr :function (x = last.par[lfixed()], ...)
..$ he :function (x = last.par[lfixed()], atomic = usingAtomics())
..$ hessian : logi FALSE
..$ method : chr "BFGS"
..$ retape :function (set.defaults = TRUE)
..$ env :<environment: 0x563bf2ee89c0>
..$ report :function (par = last.par)
..$ simulate:function (par = last.par, complete = FALSE)
$ fit :List of 7
..$ par : Named num [1:3] 51.933 0.811 5.666
.. ..- attr(*, "names")= chr [1:3] "beta" "beta" "betad"
..$ objective : num 42.5
..$ convergence: int 0
..$ iterations : int 33
..$ evaluations: Named int [1:2] 37 34
.. ..- attr(*, "names")= chr [1:2] "function" "gradient"
..$ message : chr "relative convergence (4)"
..$ parfull : Named num [1:3] 51.933 0.811 5.666
.. ..- attr(*, "names")= chr [1:3] "beta" "beta" "betad"
$ sdr :List of 8
..$ value : num(0)
..$ sd : num(0)
..$ cov : logi[0 , 0 ]
..$ par.fixed : Named num [1:3] 51.933 0.811 5.666
.. ..- attr(*, "names")= chr [1:3] "beta" "beta" "betad"
..$ cov.fixed : num [1:3, 1:3] 1.35e+02 -7.70e-01 -3.53e-06 -7.70e-01 5.60e-03 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:3] "beta" "beta" "betad"
.. .. ..$ : chr [1:3] "beta" "beta" "betad"
..$ pdHess : logi TRUE
..$ gradient.fixed: num [1, 1:3] -1.88e-07 -9.95e-06 -3.76e-07
..$ env :<environment: 0x563bf2849890>
..- attr(*, "class")= chr "sdreport"
$ call : language glmmTMB::glmmTMB(formula = YIELD ~ FERTILIZER, data = fert, ziformula = ~0, dispformula = ~1)
$ frame :'data.frame': 10 obs. of 2 variables:
..$ YIELD : num [1:10] 84 80 90 154 148 169 206 244 212 248
..$ FERTILIZER: num [1:10] 25 50 75 100 125 150 175 200 225 250
..- attr(*, "terms")=Classes 'terms', 'formula' language YIELD ~ FERTILIZER + 0 + 1
.. .. ..- attr(*, "variables")= language list(YIELD, FERTILIZER)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "YIELD" "FERTILIZER"
.. .. .. .. ..$ : chr "FERTILIZER"
.. .. ..- attr(*, "term.labels")= chr "FERTILIZER"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(YIELD, FERTILIZER)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "YIELD" "FERTILIZER"
$ modelInfo:List of 13
..$ nobs : int 10
..$ respCol : Named int 1
.. ..- attr(*, "names")= chr "YIELD"
..$ grpVar : NULL
..$ family :List of 12
.. ..$ family : chr "gaussian"
.. ..$ link : chr "identity"
.. ..$ linkfun :function (mu)
.. ..$ linkinv :function (eta)
.. ..$ variance :function (mu)
.. ..$ dev.resids:function (y, mu, wt)
.. ..$ aic :function (y, n, mu, wt, dev)
.. ..$ mu.eta :function (eta)
.. ..$ initialize: expression({ n <- rep.int(1, nobs) if (is.null(etastart) && is.null(start) && is.null(mustart) && ((family$link| __truncated__
.. ..$ validmu :function (mu)
.. ..$ valideta :function (eta)
.. ..$ dispersion: num NA
.. ..- attr(*, "class")= chr "family"
..$ contrasts: NULL
..$ reTrms :List of 2
.. ..$ cond:List of 1
.. .. ..$ terms:List of 1
.. .. .. ..$ fixed:Classes 'terms', 'formula' language YIELD ~ FERTILIZER
.. .. .. .. .. ..- attr(*, "variables")= language list(YIELD, FERTILIZER)
.. .. .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. .. .. ..$ : chr [1:2] "YIELD" "FERTILIZER"
.. .. .. .. .. .. .. ..$ : chr "FERTILIZER"
.. .. .. .. .. ..- attr(*, "term.labels")= chr "FERTILIZER"
.. .. .. .. .. ..- attr(*, "order")= int 1
.. .. .. .. .. ..- attr(*, "intercept")= int 1
.. .. .. .. .. ..- attr(*, "response")= int 1
.. .. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. .. .. .. ..- attr(*, "predvars")= language list(YIELD, FERTILIZER)
.. .. .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. .. .. .. ..- attr(*, "names")= chr [1:2] "YIELD" "FERTILIZER"
.. ..$ zi :List of 1
.. .. ..$ terms: NULL
..$ terms :List of 3
.. ..$ cond:List of 1
.. .. ..$ fixed:Classes 'terms', 'formula' language YIELD ~ FERTILIZER
.. .. .. .. ..- attr(*, "variables")= language list(YIELD, FERTILIZER)
.. .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. .. ..$ : chr [1:2] "YIELD" "FERTILIZER"
.. .. .. .. .. .. ..$ : chr "FERTILIZER"
.. .. .. .. ..- attr(*, "term.labels")= chr "FERTILIZER"
.. .. .. .. ..- attr(*, "order")= int 1
.. .. .. .. ..- attr(*, "intercept")= int 1
.. .. .. .. ..- attr(*, "response")= int 1
.. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. .. .. ..- attr(*, "predvars")= language list(YIELD, FERTILIZER)
.. .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. .. .. ..- attr(*, "names")= chr [1:2] "YIELD" "FERTILIZER"
.. ..$ zi : NULL
.. ..$ disp:List of 1
.. .. ..$ fixed:Classes 'terms', 'formula' language ~1
.. .. .. .. ..- attr(*, "variables")= language list()
.. .. .. .. ..- attr(*, "factors")= int(0)
.. .. .. .. ..- attr(*, "term.labels")= chr(0)
.. .. .. .. ..- attr(*, "order")= int(0)
.. .. .. .. ..- attr(*, "intercept")= int 1
.. .. .. .. ..- attr(*, "response")= int 0
.. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. .. .. ..- attr(*, "predvars")= language list()
.. .. .. .. ..- attr(*, "dataClasses")= Named chr(0)
.. .. .. .. .. ..- attr(*, "names")= chr(0)
..$ reStruc :List of 2
.. ..$ condReStruc: list()
.. ..$ ziReStruc : list()
..$ allForm :List of 4
.. ..$ combForm :Class 'formula' language YIELD ~ FERTILIZER + 0 + 1
.. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..$ formula :Class 'formula' language YIELD ~ FERTILIZER
.. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..$ ziformula :Class 'formula' language ~0
.. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..$ dispformula:Class 'formula' language ~1
.. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..$ REML : logi FALSE
..$ map : NULL
..$ sparseX : Named logi [1:3] FALSE FALSE FALSE
.. ..- attr(*, "names")= chr [1:3] "cond" "zi" "disp"
..$ parallel : int 1
$ fitted : NULL
- attr(*, "class")= chr "glmmTMB"
YIELD FERTILIZER
1 84 25
2 80 50
3 90 75
4 154 100
5 148 125
6 169 150
7 206 175
8 244 200
9 212 225
10 248 250
$par
beta beta betad
51.933316 0.811394 5.665667
$objective
[1] 42.51772
$convergence
[1] 0
$iterations
[1] 33
$evaluations
function gradient
37 34
$message
[1] "relative convergence (4)"
$parfull
beta beta betad
51.933316 0.811394 5.665667
Rather than directly access these attributes (which may have different names in different modelling functions), there are extraction functions that reach into the model output and extract the information in a convenient way.
Conditional model:
(Intercept) FERTILIZER
51.9333 0.8114
[1] 72.21817 92.50302 112.78787 133.07272 153.35757 173.64242 193.92727
[8] 214.21212 234.49697 254.78182
1 2 3 4 5 6 7
11.781834 -12.503017 -22.787868 20.927281 -5.357569 -4.642420 12.072729
8 9 10
29.787879 -22.496972 -6.781823
function (object, type = c("response", "pearson", "working"),
...)
NULL
The glmmTMB() function returns a list comprising the estimated coefficients, the residuals, the data used to fit the model and various other attributes associated with the model fitting procedure.
$names
[1] "obj" "fit" "sdr" "call" "frame" "modelInfo"
[7] "fitted"
$class
[1] "glmmTMB"
List of 7
$ obj :List of 10
..$ par : Named num [1:3] 1 1 0
.. ..- attr(*, "names")= chr [1:3] "beta" "beta" "betad"
..$ fn :function (x = last.par[lfixed()], ...)
..$ gr :function (x = last.par[lfixed()], ...)
..$ he :function (x = last.par[lfixed()], atomic = usingAtomics())
..$ hessian : logi FALSE
..$ method : chr "BFGS"
..$ retape :function (set.defaults = TRUE)
..$ env :<environment: 0x555d629c1218>
..$ report :function (par = last.par)
..$ simulate:function (par = last.par, complete = FALSE)
$ fit :List of 7
..$ par : Named num [1:3] 163.5 0.811 5.666
.. ..- attr(*, "names")= chr [1:3] "beta" "beta" "betad"
..$ objective : num 42.5
..$ convergence: int 0
..$ iterations : int 46
..$ evaluations: Named int [1:2] 54 47
.. ..- attr(*, "names")= chr [1:2] "function" "gradient"
..$ message : chr "relative convergence (4)"
..$ parfull : Named num [1:3] 163.5 0.811 5.666
.. ..- attr(*, "names")= chr [1:3] "beta" "beta" "betad"
$ sdr :List of 8
..$ value : num(0)
..$ sd : num(0)
..$ cov : logi[0 , 0 ]
..$ par.fixed : Named num [1:3] 163.5 0.811 5.666
.. ..- attr(*, "names")= chr [1:3] "beta" "beta" "betad"
..$ cov.fixed : num [1:3, 1:3] 2.89e+01 1.46e-13 1.67e-07 1.46e-13 5.60e-03 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:3] "beta" "beta" "betad"
.. .. ..$ : chr [1:3] "beta" "beta" "betad"
..$ pdHess : logi TRUE
..$ gradient.fixed: num [1, 1:3] 2.88e-08 3.56e-06 -3.31e-07
..$ env :<environment: 0x555d639e9530>
..- attr(*, "class")= chr "sdreport"
$ call : language glmmTMB::glmmTMB(formula = YIELD ~ center(FERTILIZER), data = fert, ziformula = ~0, dispformula = ~1)
$ frame :'data.frame': 10 obs. of 2 variables:
..$ YIELD : num [1:10] 84 80 90 154 148 169 206 244 212 248
..$ center(FERTILIZER): 'dw_transformer' num [1:10] -112.5 -87.5 -62.5 -37.5 -12.5 ...
.. ..- attr(*, "center")= num 138
.. ..- attr(*, "scale")= num 1
.. ..- attr(*, "robust")= logi FALSE
..- attr(*, "terms")=Classes 'terms', 'formula' language YIELD ~ center(FERTILIZER) + 0 + 1
.. .. ..- attr(*, "variables")= language list(YIELD, center(FERTILIZER))
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "YIELD" "center(FERTILIZER)"
.. .. .. .. ..$ : chr "center(FERTILIZER)"
.. .. ..- attr(*, "term.labels")= chr "center(FERTILIZER)"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(YIELD, center(FERTILIZER, center = 137.5, verbose = FALSE))
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "YIELD" "center(FERTILIZER)"
$ modelInfo:List of 13
..$ nobs : int 10
..$ respCol : Named int 1
.. ..- attr(*, "names")= chr "YIELD"
..$ grpVar : NULL
..$ family :List of 12
.. ..$ family : chr "gaussian"
.. ..$ link : chr "identity"
.. ..$ linkfun :function (mu)
.. ..$ linkinv :function (eta)
.. ..$ variance :function (mu)
.. ..$ dev.resids:function (y, mu, wt)
.. ..$ aic :function (y, n, mu, wt, dev)
.. ..$ mu.eta :function (eta)
.. ..$ initialize: expression({ n <- rep.int(1, nobs) if (is.null(etastart) && is.null(start) && is.null(mustart) && ((family$link| __truncated__
.. ..$ validmu :function (mu)
.. ..$ valideta :function (eta)
.. ..$ dispersion: num NA
.. ..- attr(*, "class")= chr "family"
..$ contrasts: NULL
..$ reTrms :List of 2
.. ..$ cond:List of 1
.. .. ..$ terms:List of 1
.. .. .. ..$ fixed:Classes 'terms', 'formula' language YIELD ~ center(FERTILIZER)
.. .. .. .. .. ..- attr(*, "variables")= language list(YIELD, center(FERTILIZER))
.. .. .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. .. .. ..$ : chr [1:2] "YIELD" "center(FERTILIZER)"
.. .. .. .. .. .. .. ..$ : chr "center(FERTILIZER)"
.. .. .. .. .. ..- attr(*, "term.labels")= chr "center(FERTILIZER)"
.. .. .. .. .. ..- attr(*, "order")= int 1
.. .. .. .. .. ..- attr(*, "intercept")= int 1
.. .. .. .. .. ..- attr(*, "response")= int 1
.. .. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. .. .. .. ..- attr(*, "predvars")= language list(YIELD, center(FERTILIZER, center = 137.5, verbose = FALSE))
.. .. .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. .. .. .. ..- attr(*, "names")= chr [1:2] "YIELD" "center(FERTILIZER)"
.. ..$ zi :List of 1
.. .. ..$ terms: NULL
..$ terms :List of 3
.. ..$ cond:List of 1
.. .. ..$ fixed:Classes 'terms', 'formula' language YIELD ~ center(FERTILIZER)
.. .. .. .. ..- attr(*, "variables")= language list(YIELD, center(FERTILIZER))
.. .. .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. .. .. ..$ : chr [1:2] "YIELD" "center(FERTILIZER)"
.. .. .. .. .. .. ..$ : chr "center(FERTILIZER)"
.. .. .. .. ..- attr(*, "term.labels")= chr "center(FERTILIZER)"
.. .. .. .. ..- attr(*, "order")= int 1
.. .. .. .. ..- attr(*, "intercept")= int 1
.. .. .. .. ..- attr(*, "response")= int 1
.. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. .. .. ..- attr(*, "predvars")= language list(YIELD, center(FERTILIZER, center = 137.5, verbose = FALSE))
.. .. .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. .. .. ..- attr(*, "names")= chr [1:2] "YIELD" "center(FERTILIZER)"
.. ..$ zi : NULL
.. ..$ disp:List of 1
.. .. ..$ fixed:Classes 'terms', 'formula' language ~1
.. .. .. .. ..- attr(*, "variables")= language list()
.. .. .. .. ..- attr(*, "factors")= int(0)
.. .. .. .. ..- attr(*, "term.labels")= chr(0)
.. .. .. .. ..- attr(*, "order")= int(0)
.. .. .. .. ..- attr(*, "intercept")= int 1
.. .. .. .. ..- attr(*, "response")= int 0
.. .. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. .. .. ..- attr(*, "predvars")= language list()
.. .. .. .. ..- attr(*, "dataClasses")= Named chr(0)
.. .. .. .. .. ..- attr(*, "names")= chr(0)
..$ reStruc :List of 2
.. ..$ condReStruc: list()
.. ..$ ziReStruc : list()
..$ allForm :List of 4
.. ..$ combForm :Class 'formula' language YIELD ~ center(FERTILIZER) + 0 + 1
.. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..$ formula :Class 'formula' language YIELD ~ center(FERTILIZER)
.. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..$ ziformula :Class 'formula' language ~0
.. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..$ dispformula:Class 'formula' language ~1
.. .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
..$ REML : logi FALSE
..$ map : NULL
..$ sparseX : Named logi [1:3] FALSE FALSE FALSE
.. ..- attr(*, "names")= chr [1:3] "cond" "zi" "disp"
..$ parallel : int 1
$ fitted : NULL
- attr(*, "class")= chr "glmmTMB"
YIELD center(FERTILIZER)
1 84 -112.5
2 80 -87.5
3 90 -62.5
4 154 -37.5
5 148 -12.5
6 169 12.5
7 206 37.5
8 244 62.5
9 212 87.5
10 248 112.5
$par
beta beta betad
163.500001 0.811394 5.665667
$objective
[1] 42.51772
$convergence
[1] 0
$iterations
[1] 46
$evaluations
function gradient
54 47
$message
[1] "relative convergence (4)"
$parfull
beta beta betad
163.500001 0.811394 5.665667
Rather than directly access these attributes (which may have different names in different modelling functions), there are extraction functions that reach into the model output and extract the information in a convenient way.
Conditional model:
(Intercept) center(FERTILIZER)
163.5000 0.8114
[1] 72.21818 92.50303 112.78788 133.07273 153.35758 173.64243 193.92727
[8] 214.21212 234.49697 254.78182
1 2 3 4 5 6 7
11.781820 -12.503029 -22.787878 20.927273 -5.357576 -4.642425 12.072726
8 9 10
29.787877 -22.496972 -6.781821
function (object, type = c("response", "pearson", "working"),
...)
NULL
Lets performing Ordinary Least Squares (OLS) regression by first principles.
To do so, we expresses the response as a vector (\(Y\)), the deterministic (linear predictor) as a matrix (\(\boldsymbol{X}\)) comprising of a column of 1’s (for multiplying the intercept parameter) and the predictor variable (for multiplying the slope parameter). We then also have a vector of beta parameters (\(\beta_0\) and \(\beta\) representing the intercept and slope respectively).
\[ y_i = \beta_0\times 1 + \beta_1\times x_i \]
\[ \underbrace{\begin{bmatrix} y_1\\ y_2\\ y_3\\ ...\\ y_i \end{bmatrix}}_{Y} = \underbrace{\begin{bmatrix} 1&x_1\\ 1&x_2\\ 1&x_3\\ ...\\ 1&x_i \end{bmatrix}}_{\boldsymbol{X}} \underbrace{\vphantom{\begin{bmatrix} y_1\\ y_2\\ y_3\\ ...\\ y_i \end{bmatrix}}\begin{bmatrix} \beta_0&\beta \end{bmatrix}}_{\boldsymbol{\beta}} \]
In OLS we estimate the parameters (\(\beta_0\) and \(\beta\)) by minimising the sum of the squared residuals (\(\boldsymbol{\varepsilon}\)).
\[ \begin{align} Y =& \boldsymbol{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}\\ \boldsymbol{\varepsilon} =& Y - \boldsymbol{X}\boldsymbol{\beta} \end{align} \]
We can then minimise the sum of squares residuals. This is essentially \(\boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon}\), where \(\boldsymbol{\varepsilon}\) is \(\boldsymbol{\varepsilon}\) transposed.
\[ \small \begin{bmatrix} \varepsilon_1&\varepsilon_2&\varepsilon_3&...&\varepsilon_n \end{bmatrix} \begin{bmatrix} \varepsilon_1\\ \varepsilon_2\\ \varepsilon_3\\ ...\\ \varepsilon_n\\ \end{bmatrix} = \begin{bmatrix} \varepsilon_1\times\varepsilon_1 + \varepsilon_2\times\varepsilon_2 + \varepsilon_3\times\varepsilon_3 + ... + \varepsilon_n\times\varepsilon_n \end{bmatrix} \]
\[ \begin{align} \boldsymbol{\varepsilon}^T\boldsymbol{\varepsilon} =& (Y - \boldsymbol{X}\boldsymbol{\beta})^T(Y - \boldsymbol{X}\boldsymbol{\beta})\\ \boldsymbol{\beta} =& (\boldsymbol{X}^T\boldsymbol{X})^{-1} \boldsymbol{X}^T Y \end{align} \]
## Generate the model matrix
X <- model.matrix(~FERTILIZER, data = fert)
## Solve for beta
beta <- solve(t(X) %*% X) %*% t(X) %*% fert$YIELD
beta [,1]
(Intercept) 51.9333333
FERTILIZER 0.8113939
The above simple linear regression model can be fit using the lm() function. This function, uses Ordinary Least Squares (OLS). The full model formula would include a 1 (for the intercept) and the predictor values (for the slope). In R, models generally always include an intercept by default and thus we can abbreviate the formula a little by omitting the 1.
6 Model validation
When supplied with a fitted linear model, the autoplot() function generates a set of standard diagnostic regression plots:
- the top left plot is a residual plot: Ideally, there should not be any patterns in the residual plot. Of particular concern would be if there was a wedge (funnel) shape (indicative of unequal variances) or some curved shape (indicative of having fit a linear trend through non-linear data).
- the top right plot is a Q-Q normal plot: This figure plots the quantiles of the residuals against the quantiles that would be expected if the data were drawn from a normal distribution. Ideally, the points should be close to the dashed line. Deviations from this line at the tails signify potential non-normality.
- the middle left plot is a standardised residual plot. This has a similar interpretation to the residual plot.
- the middle right plot displays the cooks distance values associated with each observation. Cook’s distance is a measure of the influence of each point on the estimated parameter values. This is a combination of residuals (measure of outlierness along the y-axis) and leverage (measure of outlierness along the x-axis). Essentially, it estimates variations in regression coefficients after removing each observation (one by one). Ideally, all the values should be under 0.8. Numbers above the bars indicate the observation number - this can be useful to identify which observations are influential.
- the bottom left plot displays the trend between residuals and leverage. In the event of large Cook’s distance, this can help identify whether the issues are due to residuals or leverage.
- the bottom right plot displays the trend between Cook’s distance and leverage.
Conclusions:
- there are no alarming signs from any of the diagnostics.
- the estimated parameters are likely to be reliable.
Cook’s distance (cook.d) and leverage (hat) are displayed in tabular form.
Influence measures of
lm(formula = YIELD ~ FERTILIZER, data = fert) :
dfb.1_ dfb.FERT dffit cov.r cook.d hat inf
1 5.39e-01 -0.4561 0.5411 1.713 0.15503 0.345
2 -4.15e-01 0.3277 -0.4239 1.497 0.09527 0.248
3 -6.01e-01 0.4237 -0.6454 0.969 0.18608 0.176
4 3.80e-01 -0.2145 0.4634 1.022 0.10137 0.127
5 -5.77e-02 0.0163 -0.0949 1.424 0.00509 0.103
6 -2.50e-02 -0.0141 -0.0821 1.432 0.00382 0.103
7 -1.80e-17 0.1159 0.2503 1.329 0.03374 0.127
8 -2.19e-01 0.6184 0.9419 0.623 0.31796 0.176
9 3.29e-01 -0.6486 -0.8390 1.022 0.30844 0.248
10 1.51e-01 -0.2559 -0.3035 1.900 0.05137 0.345 *
YIELD FERTILIZER .hat .sigma .cooksd .fitted .resid
1 84 25 0.3454545 19.55115 0.155034335 72.21818 11.781818
2 80 50 0.2484848 19.56598 0.095267982 92.50303 -12.503030
3 90 75 0.1757576 17.95943 0.186081798 112.78788 -22.787879
4 154 100 0.1272727 18.46227 0.101366743 133.07273 20.927273
5 148 125 0.1030303 20.19832 0.005091410 153.35758 -5.357576
6 169 150 0.1030303 20.22650 0.003822883 173.64242 -4.642424
7 206 175 0.1272727 19.71511 0.033735022 193.92727 12.072727
8 244 200 0.1757576 16.08585 0.317962021 214.21212 29.787879
9 212 225 0.2484848 17.78582 0.308435562 234.49697 -22.496970
10 248 250 0.3454545 20.06254 0.051368341 254.78182 -6.781818
.stdresid
1 0.7664845
2 -0.7591147
3 -1.3211052
4 1.1790558
5 -0.2977422
6 -0.2579983
7 0.6801851
8 1.7269234
9 -1.3658913
10 -0.4412017
The performance package provides a similar set of visual diagnostics:
- the top left plot is a Q-Q normal plot.
- the top right plot compares the distribution (density) of the observed data (blue) with a normal (or other nominated) distribution (green)
- the middle two plots are residual and standard residual plots. Ideally, these should show no patterns in the residuals
- the bottom left plot provides a frequency bar chart of Cook’s distance values.
OK: No outliers detected.
- Based on the following method and threshold: cook (0.8).
- For variable: (Whole model)
Conclusions:
- there are no alarming signs from any of the diagnostics
Although an exploration of model residuals can provide important insights into the goodness of fit and conformity of a model to the underlying assumptions, diagnosing issues is a bit of a fine art. This difficulty is exacerbated when the residuals are being calculated from some models (e.g logistic regression models) where it is difficult to separate out nefarious patterns from the specific patterns expected of the specific model.
The DHARMa package generates standardised residuals via simulation and uses these as the basis of a range of tools to diagnose common modelling issues including outliers, heterogeneity, over-dispersion, autocorrelation.
New observations simulated from the fitted model are used to calculate a cumulative density function (CDF) that describes the probability profile of each observation. Thereafter, the residual of an observation is calculated as the value of the CDF that corresponds to the actual observed value:
- a value of 0 indicates that the observed value was less than all simulated values
- a value of 1 indicates that the observed value was greater than all simulated values
- a value of 0.5 indicates that have the observed value were greater than all simulated values from a correctly specified model, these quantile residuals should be uniformly distributed
This approach ensures that all residuals have the same interpretation irrespective of the model and distribution selected.
Exploring DHARMa residuals begins with running the simulateResiduals() function. If this is done with plot=TRUE, a pair of diagnostic plots with a range of diagnostic tests will be provided as side effects.
- the left hand plot is a Q-Q plot (ideally all points should be close to the red line) the KS (Kolmogorov-Smirnov) test tests whether the (in this case simulated) are likely to have been drawn from the nominated distribution (in this case Gaussian).
- the Dispersion test tests whether the standard deviation of the data is equal to that of the simulated data
- the Outlier test tests for the prevalence of outliers (when observed values are outside the simulated range)
- the right hand plot is a residual plot. Ideally, there should be no patterns in the residuals. To help identify any patterns, quantile trends are overlayed. Ideally, there should be a flat black line at each of the quantiles of 0.25, 0.5 and 0.75.
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.224, p-value = 0.6209
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.90609, p-value = 0.952
alternative hypothesis: two.sided
$outliers
DHARMa outlier test based on exact binomial test with approximate
expectations
data: simulationOutput
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.224, p-value = 0.6209
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.90609, p-value = 0.952
alternative hypothesis: two.sided
$outliers
DHARMa outlier test based on exact binomial test with approximate
expectations
data: simulationOutput
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.224, p-value = 0.6209
alternative hypothesis: two-sided
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.90609, p-value = 0.952
alternative hypothesis: two.sided
DHARMa outlier test based on exact binomial test with approximate
expectations
data: fert_resid
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = NaN, p-value = 1
alternative hypothesis: two.sided
Test for location of quantiles via qgam
data: simulationOutput
p-value = 0.5887
alternative hypothesis: both
The broom package has an augment method that adds information generated from a fitted model to the modelled data.
# A tibble: 10 × 8
YIELD FERTILIZER .fitted .resid .hat .sigma .cooksd .std.resid
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 84 25 72.2 11.8 0.345 19.6 0.155 0.766
2 80 50 92.5 -12.5 0.248 19.6 0.0953 -0.759
3 90 75 113. -22.8 0.176 18.0 0.186 -1.32
4 154 100 133. 20.9 0.127 18.5 0.101 1.18
5 148 125 153. -5.36 0.103 20.2 0.00509 -0.298
6 169 150 174. -4.64 0.103 20.2 0.00382 -0.258
7 206 175 194. 12.1 0.127 19.7 0.0337 0.680
8 244 200 214. 29.8 0.176 16.1 0.318 1.73
9 212 225 234. -22.5 0.248 17.8 0.308 -1.37
10 248 250 255. -6.78 0.345 20.1 0.0514 -0.441
## we could then pipe these to ggplot in order to look at residuals
## etc
fert_lm |>
augment() |>
ggplot() +
geom_point(aes(y = .resid, x = .fitted))When supplied with a fitted linear model, the autoplot() function generates a set of standard diagnostic regression plots:
- the top left plot is a residual plot: Ideally, there should not be any patterns in the residual plot. Of particular concern would be if there was a wedge (funnel) shape (indicative of unequal variances) or some curved shape (indicative of having fit a linear trend through non-linear data).
- the top right plot is a Q-Q normal plot: This figure plots the quantiles of the residuals against the quantiles that would be expected if the data were drawn from a normal distribution. Ideally, the points should be close to the dashed line. Deviations from this line at the tails signify potential non-normality.
- the middle left plot is a standardised residual plot. This has a similar interpretation to the residual plot.
- the middle right plot displays the cooks distance values associated with each observation. Cook’s distance is a measure of the influence of each point on the estimated parameter values. This is a combination of residuals (measure of outlierness along the y-axis) and leverage (measure of outlierness along the x-axis). Essentially, it estimates variations in regression coefficients after removing each observation (one by one). Ideally, all the values should be under 0.8. Numbers above the bars indicate the observation number - this can be useful to identify which observations are influential.
- the bottom left plot displays the trend between residuals and leverage. In the event of large Cook’s distance, this can help identify whether the issues are due to residuals or leverage.
- the bottom right plot displays the trend between Cook’s distance and leverage.
Conclusions:
- there are no alarming signs from any of the diagnostics.
- the estimated parameters are likely to be reliable.
Cook’s distance (cook.d) and leverage (hat) are displayed in tabular form.
Influence measures of
lm(formula = YIELD ~ center(FERTILIZER), data = fert) :
dfb.1_ dfb.c.FE dffit cov.r cook.d hat inf
1 0.2911 -0.4561 0.5411 1.713 0.15503 0.345
2 -0.2689 0.3277 -0.4239 1.497 0.09527 0.248
3 -0.4868 0.4237 -0.6454 0.969 0.18608 0.176
4 0.4107 -0.2145 0.4634 1.022 0.10137 0.127
5 -0.0935 0.0163 -0.0949 1.424 0.00509 0.103
6 -0.0809 -0.0141 -0.0821 1.432 0.00382 0.103
7 0.2219 0.1159 0.2503 1.329 0.03374 0.127
8 0.7105 0.6184 0.9419 0.623 0.31796 0.176
9 -0.5322 -0.6486 -0.8390 1.022 0.30844 0.248
10 -0.1633 -0.2559 -0.3035 1.900 0.05137 0.345 *
YIELD center(FERTILIZER) .hat .sigma .cooksd .fitted .resid
1 84 -112.5 0.3454545 19.55115 0.155034335 72.21818 11.781818
2 80 -87.5 0.2484848 19.56598 0.095267982 92.50303 -12.503030
3 90 -62.5 0.1757576 17.95943 0.186081798 112.78788 -22.787879
4 154 -37.5 0.1272727 18.46227 0.101366743 133.07273 20.927273
5 148 -12.5 0.1030303 20.19832 0.005091410 153.35758 -5.357576
6 169 12.5 0.1030303 20.22650 0.003822883 173.64242 -4.642424
7 206 37.5 0.1272727 19.71511 0.033735022 193.92727 12.072727
8 244 62.5 0.1757576 16.08585 0.317962021 214.21212 29.787879
9 212 87.5 0.2484848 17.78582 0.308435562 234.49697 -22.496970
10 248 112.5 0.3454545 20.06254 0.051368341 254.78182 -6.781818
.stdresid
1 0.7664845
2 -0.7591147
3 -1.3211052
4 1.1790558
5 -0.2977422
6 -0.2579983
7 0.6801851
8 1.7269234
9 -1.3658913
10 -0.4412017
The performance package provides a similar set of visual diagnostics:
- the top left plot is a Q-Q normal plot.
- the top right plot compares the distribution (density) of the observed data (blue) with a normal (or other nominated) distribution (green)
- the middle two plots are residual and standard residual plots. Ideally, these should show no patterns in the residuals
- the bottom left plot provides a frequency bar chart of Cook’s distance values.
OK: No outliers detected.
- Based on the following method and threshold: cook (0.8).
- For variable: (Whole model)
Conclusions:
- there are no alarming signs from any of the diagnostics
Although an exploration of model residuals can provide important insights into the goodness of fit and conformity of a model to the underlying assumptions, diagnosing issues is a bit of a fine art. This difficulty is exacerbated when the residuals are being calculated from some models (e.g logistic regression models) where it is difficult to separate out nefarious patterns from the specific patterns expected of the specific model.
The DHARMa package generates standardised residuals via simulation and uses these as the basis of a range of tools to diagnose common modelling issues including outliers, heterogeneity, over-dispersion, autocorrelation.
New observations simulated from the fitted model are used to calculate a cumulative density function (CDF) that describes the probability profile of each observation. Thereafter, the residual of an observation is calculated as the value of the CDF that corresponds to the actual observed value:
- a value of 0 indicates that the observed value was less than all simulated values
- a value of 1 indicates that the observed value was greater than all simulated values
- a value of 0.5 indicates that have the observed value were greater than all simulated values from a correctly specified model, these quantile residuals should be uniformly distributed
This approach ensures that all residuals have the same interpretation irrespective of the model and distribution selected.
Exploring DHARMa residuals begins with running the simulateResiduals() function. If this is done with plot=TRUE, a pair of diagnostic plots with a range of diagnostic tests will be provided as side effects.
- the left hand plot is a Q-Q plot (ideally all points should be close to the red line) the KS (Kolmogorov-Smirnov) test tests whether the (in this case simulated) are likely to have been drawn from the nominated distribution (in this case Gaussian).
- the Dispersion test tests whether the standard deviation of the data is equal to that of the simulated data
- the Outlier test tests for the prevalence of outliers (when observed values are outside the simulated range)
- the right hand plot is a residual plot. Ideally, there should be no patterns in the residuals. To help identify any patterns, quantile trends are overlayed. Ideally, there should be a flat black line at each of the quantiles of 0.25, 0.5 and 0.75.
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.224, p-value = 0.6209
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.90609, p-value = 0.952
alternative hypothesis: two.sided
$outliers
DHARMa outlier test based on exact binomial test with approximate
expectations
data: simulationOutput
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.224, p-value = 0.6209
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.90609, p-value = 0.952
alternative hypothesis: two.sided
$outliers
DHARMa outlier test based on exact binomial test with approximate
expectations
data: simulationOutput
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.224, p-value = 0.6209
alternative hypothesis: two-sided
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 0.90609, p-value = 0.952
alternative hypothesis: two.sided
DHARMa outlier test based on exact binomial test with approximate
expectations
data: fert_resid1
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = NaN, p-value = 1
alternative hypothesis: two.sided
Test for location of quantiles via qgam
data: simulationOutput
p-value = 0.5887
alternative hypothesis: both
The broom package has an augment method that adds information generated from a fitted model to the modelled data.
# A tibble: 10 × 8
YIELD `center(FERTILIZER)` .fitted .resid .hat .sigma .cooksd .std.resid
<dbl> <dw_trnsf> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 84 -112.5 72.2 11.8 0.345 19.6 0.155 0.766
2 80 -87.5 92.5 -12.5 0.248 19.6 0.0953 -0.759
3 90 -62.5 113. -22.8 0.176 18.0 0.186 -1.32
4 154 -37.5 133. 20.9 0.127 18.5 0.101 1.18
5 148 -12.5 153. -5.36 0.103 20.2 0.00509 -0.298
6 169 12.5 174. -4.64 0.103 20.2 0.00382 -0.258
7 206 37.5 194. 12.1 0.127 19.7 0.0337 0.680
8 244 62.5 214. 29.8 0.176 16.1 0.318 1.73
9 212 87.5 234. -22.5 0.248 17.8 0.308 -1.37
10 248 112.5 255. -6.78 0.345 20.1 0.0514 -0.441
## we could then pipe these to ggplot in order to look at residuals
## etc
fert_lm1 |>
augment() |>
ggplot() +
geom_point(aes(y = .resid, x = .fitted))Cook’s distance (cook.d) and leverage (hat) are displayed in tabular form.
The influences.measures() function does not have a method for glmmTMB. On the surface, there are two options.
- if the
glmmTMBmodel has random effects, we can use a function within theglmmTMBpackage. Note, at the time of writing, an influence measures method forglmmTMBwas not exported as a method. However, if the following is issued, then it is available.source(system.file("other_methods","influence_mixed.R", package="glmmTMB")). Whilst this might change at some time down the track, for now, the only way to access this method is to first source it from the package itself.
- importantly, the above experimental method is not available for models that do not have random effects. Nevertheless, given that it is such a simple model, we can just convert it to a
lm.
Influence measures of
lm(formula = fert_glmmTMB) :
dfb.1_ dfb.FERT dffit cov.r cook.d hat inf
1 5.39e-01 -0.4561 0.5411 1.713 0.15503 0.345
2 -4.15e-01 0.3277 -0.4239 1.497 0.09527 0.248
3 -6.01e-01 0.4237 -0.6454 0.969 0.18608 0.176
4 3.80e-01 -0.2145 0.4634 1.022 0.10137 0.127
5 -5.77e-02 0.0163 -0.0949 1.424 0.00509 0.103
6 -2.50e-02 -0.0141 -0.0821 1.432 0.00382 0.103
7 -1.80e-17 0.1159 0.2503 1.329 0.03374 0.127
8 -2.19e-01 0.6184 0.9419 0.623 0.31796 0.176
9 3.29e-01 -0.6486 -0.8390 1.022 0.30844 0.248
10 1.51e-01 -0.2559 -0.3035 1.900 0.05137 0.345 *
YIELD FERTILIZER .hat .sigma .cooksd .fitted .resid
1 84 25 0.3454545 19.55115 0.155034335 72.21818 11.781818
2 80 50 0.2484848 19.56598 0.095267982 92.50303 -12.503030
3 90 75 0.1757576 17.95943 0.186081798 112.78788 -22.787879
4 154 100 0.1272727 18.46227 0.101366743 133.07273 20.927273
5 148 125 0.1030303 20.19832 0.005091410 153.35758 -5.357576
6 169 150 0.1030303 20.22650 0.003822883 173.64242 -4.642424
7 206 175 0.1272727 19.71511 0.033735022 193.92727 12.072727
8 244 200 0.1757576 16.08585 0.317962021 214.21212 29.787879
9 212 225 0.2484848 17.78582 0.308435562 234.49697 -22.496970
10 248 250 0.3454545 20.06254 0.051368341 254.78182 -6.781818
.stdresid
1 0.7664845
2 -0.7591147
3 -1.3211052
4 1.1790558
5 -0.2977422
6 -0.2579983
7 0.6801851
8 1.7269234
9 -1.3658913
10 -0.4412017
The performance package provides a similar set of visual diagnostics:
- the top left plot is a Q-Q normal plot.
- the top right plot compares the distribution (density) of the observed data (blue) with a normal (or other nominated) distribution (green)
- the middle two plots are residual and standard residual plots. Ideally, these should show no patterns in the residuals
- the bottom left plot provides a frequency bar chart of Cook’s distance values.
Conclusions:
- there are no alarming signs from any of the diagnostics
Although an exploration of model residuals can provide important insights into the goodness of fit and conformity of a model to the underlying assumptions, diagnosing issues is a bit of a fine art. This difficulty is exacerbated when the residuals are being calculated from some models (e.g logistic regression models) where it is difficult to separate out nefarious patterns from the specific patterns expected of the specific model.
The DHARMa package generates standardised residuals via simulation and uses these as the basis of a range of tools to diagnose common modelling issues including outliers, heterogeneity, over-dispersion, autocorrelation.
New observations simulated from the fitted model are used to calculate a cumulative density function (CDF) that describes the probability profile of each observation. Thereafter, the residual of an observation is calculated as the value of the CDF that corresponds to the actual observed value:
- a value of 0 indicates that the observed value was less than all simulated values
- a value of 1 indicates that the observed value was greater than all simulated values
- a value of 0.5 indicates that have the observed value were greater than all simulated values from a correctly specified model, these quantile residuals should be uniformly distributed
This approach ensures that all residuals have the same interpretation irrespective of the model and distribution selected.
Exploring DHARMa residuals begins with running the simulateResiduals() function. If this is done with plot=TRUE, a pair of diagnostic plots with a range of diagnostic tests will be provided as side effects.
- the left hand plot is a Q-Q plot (ideally all points should be close to the red line) the KS (Kolmogorov-Smirnov) test tests whether the (in this case simulated) are likely to have been drawn from the nominated distribution (in this case Gaussian).
- the Dispersion test tests whether the standard deviation of the data is equal to that of the simulated data
- the Outlier test tests for the prevalence of outliers (when observed values are outside the simulated range)
- the right hand plot is a residual plot. Ideally, there should be no patterns in the residuals. To help identify any patterns, quantile trends are overlayed. Ideally, there should be a flat black line at each of the quantiles of 0.25, 0.5 and 0.75.
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.228, p-value = 0.5994
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.1326, p-value = 0.688
alternative hypothesis: two.sided
$outliers
DHARMa outlier test based on exact binomial test with approximate
expectations
data: simulationOutput
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.228, p-value = 0.5994
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.1326, p-value = 0.688
alternative hypothesis: two.sided
$outliers
DHARMa outlier test based on exact binomial test with approximate
expectations
data: simulationOutput
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.228, p-value = 0.5994
alternative hypothesis: two-sided
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.1326, p-value = 0.688
alternative hypothesis: two.sided
DHARMa outlier test based on exact binomial test with approximate
expectations
data: fert_resid
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = NaN, p-value = 1
alternative hypothesis: two.sided
Test for location of quantiles via qgam
data: simulationOutput
p-value = 0.3161
alternative hypothesis: both
The broom package has an augment method that adds information generated from a fitted model to the modelled data.
# A tibble: 10 × 8
YIELD FERTILIZER .fitted .resid .hat .sigma .cooksd .std.resid
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 84 25 72.2 11.8 0.345 19.6 0.155 0.766
2 80 50 92.5 -12.5 0.248 19.6 0.0953 -0.759
3 90 75 113. -22.8 0.176 18.0 0.186 -1.32
4 154 100 133. 20.9 0.127 18.5 0.101 1.18
5 148 125 153. -5.36 0.103 20.2 0.00509 -0.298
6 169 150 174. -4.64 0.103 20.2 0.00382 -0.258
7 206 175 194. 12.1 0.127 19.7 0.0337 0.680
8 244 200 214. 29.8 0.176 16.1 0.318 1.73
9 212 225 234. -22.5 0.248 17.8 0.308 -1.37
10 248 250 255. -6.78 0.345 20.1 0.0514 -0.441
## we could then pipe these to ggplot in order to look at residuals
## etc
fert_glmmTMB |>
lm() |>
augment() |>
ggplot() +
geom_point(aes(y = .resid, x = .fitted))Cook’s distance (cook.d) and leverage (hat) are displayed in tabular form.
The influences.measures() function does not have a method for glmmTMB. On the surface, there are two options.
- if the
glmmTMBmodel has random effects, we can use a function within theglmmTMBpackage. Note, at the time of writing, an influence measures method forglmmTMBwas not exported as a method. However, if the following is issued, then it is available.source(system.file("other_methods","influence_mixed.R", package="glmmTMB")). Whilst this might change at some time down the track, for now, the only way to access this method is to first source it from the package itself.
- importantly, the above experimental method is not available for models that do not have random effects. Nevertheless, given that it is such a simple model, we can just convert it to a
lm.
Influence measures of
lm(formula = fert_glmmTMB1) :
dfb.1_ dfb.c.FE dffit cov.r cook.d hat inf
1 0.2911 -0.4561 0.5411 1.713 0.15503 0.345
2 -0.2689 0.3277 -0.4239 1.497 0.09527 0.248
3 -0.4868 0.4237 -0.6454 0.969 0.18608 0.176
4 0.4107 -0.2145 0.4634 1.022 0.10137 0.127
5 -0.0935 0.0163 -0.0949 1.424 0.00509 0.103
6 -0.0809 -0.0141 -0.0821 1.432 0.00382 0.103
7 0.2219 0.1159 0.2503 1.329 0.03374 0.127
8 0.7105 0.6184 0.9419 0.623 0.31796 0.176
9 -0.5322 -0.6486 -0.8390 1.022 0.30844 0.248
10 -0.1633 -0.2559 -0.3035 1.900 0.05137 0.345 *
YIELD center(FERTILIZER) .hat .sigma .cooksd .fitted .resid
1 84 -112.5 0.3454545 19.55115 0.155034335 72.21818 11.781818
2 80 -87.5 0.2484848 19.56598 0.095267982 92.50303 -12.503030
3 90 -62.5 0.1757576 17.95943 0.186081798 112.78788 -22.787879
4 154 -37.5 0.1272727 18.46227 0.101366743 133.07273 20.927273
5 148 -12.5 0.1030303 20.19832 0.005091410 153.35758 -5.357576
6 169 12.5 0.1030303 20.22650 0.003822883 173.64242 -4.642424
7 206 37.5 0.1272727 19.71511 0.033735022 193.92727 12.072727
8 244 62.5 0.1757576 16.08585 0.317962021 214.21212 29.787879
9 212 87.5 0.2484848 17.78582 0.308435562 234.49697 -22.496970
10 248 112.5 0.3454545 20.06254 0.051368341 254.78182 -6.781818
.stdresid
1 0.7664845
2 -0.7591147
3 -1.3211052
4 1.1790558
5 -0.2977422
6 -0.2579983
7 0.6801851
8 1.7269234
9 -1.3658913
10 -0.4412017
The performance package provides a similar set of visual diagnostics:
- the top left plot is a Q-Q normal plot.
- the top right plot compares the distribution (density) of the observed data (blue) with a normal (or other nominated) distribution (green)
- the middle two plots are residual and standard residual plots. Ideally, these should show no patterns in the residuals
- the bottom left plot provides a frequency bar chart of Cook’s distance values.
Conclusions:
- there are no alarming signs from any of the diagnostics
Although an exploration of model residuals can provide important insights into the goodness of fit and conformity of a model to the underlying assumptions, diagnosing issues is a bit of a fine art. This difficulty is exacerbated when the residuals are being calculated from some models (e.g logistic regression models) where it is difficult to separate out nefarious patterns from the specific patterns expected of the specific model.
The DHARMa package generates standardised residuals via simulation and uses these as the basis of a range of tools to diagnose common modelling issues including outliers, heterogeneity, over-dispersion, autocorrelation.
New observations simulated from the fitted model are used to calculate a cumulative density function (CDF) that describes the probability profile of each observation. Thereafter, the residual of an observation is calculated as the value of the CDF that corresponds to the actual observed value:
- a value of 0 indicates that the observed value was less than all simulated values
- a value of 1 indicates that the observed value was greater than all simulated values
- a value of 0.5 indicates that have the observed value were greater than all simulated values from a correctly specified model, these quantile residuals should be uniformly distributed
This approach ensures that all residuals have the same interpretation irrespective of the model and distribution selected.
Exploring DHARMa residuals begins with running the simulateResiduals() function. If this is done with plot=TRUE, a pair of diagnostic plots with a range of diagnostic tests will be provided as side effects.
- the left hand plot is a Q-Q plot (ideally all points should be close to the red line) the KS (Kolmogorov-Smirnov) test tests whether the (in this case simulated) are likely to have been drawn from the nominated distribution (in this case Gaussian).
- the Dispersion test tests whether the standard deviation of the data is equal to that of the simulated data
- the Outlier test tests for the prevalence of outliers (when observed values are outside the simulated range)
- the right hand plot is a residual plot. Ideally, there should be no patterns in the residuals. To help identify any patterns, quantile trends are overlayed. Ideally, there should be a flat black line at each of the quantiles of 0.25, 0.5 and 0.75.
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.228, p-value = 0.5994
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.1326, p-value = 0.688
alternative hypothesis: two.sided
$outliers
DHARMa outlier test based on exact binomial test with approximate
expectations
data: simulationOutput
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
$uniformity
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.228, p-value = 0.5994
alternative hypothesis: two-sided
$dispersion
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.1326, p-value = 0.688
alternative hypothesis: two.sided
$outliers
DHARMa outlier test based on exact binomial test with approximate
expectations
data: simulationOutput
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
Exact one-sample Kolmogorov-Smirnov test
data: simulationOutput$scaledResiduals
D = 0.228, p-value = 0.5994
alternative hypothesis: two-sided
DHARMa nonparametric dispersion test via sd of residuals fitted vs.
simulated
data: simulationOutput
dispersion = 1.1326, p-value = 0.688
alternative hypothesis: two.sided
DHARMa outlier test based on exact binomial test with approximate
expectations
data: fert_resid1
outliers at both margin(s) = 0, observations = 10, p-value = 1
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
0.0000000 0.3084971
sample estimates:
frequency of outliers (expected: 0.00796812749003984 )
0
DHARMa zero-inflation test via comparison to expected zeros with
simulation under H0 = fitted model
data: simulationOutput
ratioObsSim = NaN, p-value = 1
alternative hypothesis: two.sided
Test for location of quantiles via qgam
data: simulationOutput
p-value = 0.3161
alternative hypothesis: both
The broom package has an augment method that adds information generated from a fitted model to the modelled data.
# A tibble: 10 × 8
YIELD `center(FERTILIZER)` .fitted .resid .hat .sigma .cooksd .std.resid
<dbl> <dw_trnsf> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 84 -112.5 72.2 11.8 0.345 19.6 0.155 0.766
2 80 -87.5 92.5 -12.5 0.248 19.6 0.0953 -0.759
3 90 -62.5 113. -22.8 0.176 18.0 0.186 -1.32
4 154 -37.5 133. 20.9 0.127 18.5 0.101 1.18
5 148 -12.5 153. -5.36 0.103 20.2 0.00509 -0.298
6 169 12.5 174. -4.64 0.103 20.2 0.00382 -0.258
7 206 37.5 194. 12.1 0.127 19.7 0.0337 0.680
8 244 62.5 214. 29.8 0.176 16.1 0.318 1.73
9 212 87.5 234. -22.5 0.248 17.8 0.308 -1.37
10 248 112.5 255. -6.78 0.345 20.1 0.0514 -0.441
## we could then pipe these to ggplot in order to look at residuals
## etc
fert_glmmTMB1 |>
lm() |>
augment() |>
ggplot() +
geom_point(aes(y = .resid, x = .fitted))7 Model outputs
Prior to examining the estimated coefficients and any associated hypothesis tests, it is a good idea to explore the fitted trends. Not only does this give you a sense for the overall outcomes (and help you interpret the model summaries etc), it also provides an opportunity to reflect on whether the model has yielded sensible patterns.
There are numerous ways to explore the fitted trends and these are largely based on two types of effects:
- conditional effects: predictions that are conditioned on certain levels (typically the reference level) of factors. For example, the trend/effects of one predictor at the first level (or first combination) of other categorical predictor(s).
- marginal effects: predictions that are marginalised (= averaged) over all levels of the factors. For example, the trend/effects of one predictor averaged across all levels (or combinations) of other predictors.
There are also numerous routines and packages to support these fitted trends (partial plots).
| Package | Function | Type | Notes |
|---|---|---|---|
sjPlot |
plot_model(type = 'eff') |
Marginal means | |
effects |
allEffects() |
Marginal means | |
ggeffects |
ggeffects() |
Marginal means | calls effects() |
ggeffects |
ggpredict() |
Conditional means | calls predict() |
ggeffects |
ggemmeans() |
Marginal means | calls emmeans() |
Note, in the absence of categorical predictors, they will all yield the same…
Warning: "prediction" and "classification" are currently not supported by the
`predict` argument for `glmmTMB` models.
Changing to `predict="expectation"`.
Warning: "prediction" and "classification" are currently not supported by the
`predict` argument for `glmmTMB` models.
Changing to `predict="expectation"`.
8 Model investigation / hypothesis testing
Call:
lm(formula = YIELD ~ FERTILIZER, data = fert)
Residuals:
Min 1Q Median 3Q Max
-22.79 -11.07 -5.00 12.00 29.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.93333 12.97904 4.001 0.00394 **
FERTILIZER 0.81139 0.08367 9.697 1.07e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19 on 8 degrees of freedom
Multiple R-squared: 0.9216, Adjusted R-squared: 0.9118
F-statistic: 94.04 on 1 and 8 DF, p-value: 1.067e-05
Conclusions:
- the estimated y-intercept (value of Yield when Fertiliser level is 0) is 51.93. Note, as a Fertiliser value of 0 is outside the range of the collected data, this intercept does not really have any useful interpretation (unless we are happy to extrapolate). If we had centred the Fertiliser predictor (either before modelling or as part of the model -
lm(YIELD~scale(FERTILIZER,scale=FALSE), data=fert)), then the y-intercept would have had a sensible interpretation. It would have been the expected Yield at the average Fertiliser level. Nevertheless, either way, the hypothesis test associated with the intercept is rarely of any value. - the estimated slope is 0.81. For every one unit increase in Fertiliser concentration, Yield is expected to increase by 0.81 units.
- the estimated slope divided by the estimated uncertainty in the slope (Std. Error) gives a t-value of
r round(summary(fert_lm)$coefficients[2, 3],2). If we compare that to a t-distribution for 8, we get a p-value < 0.05. Hence we would reject the null hypothesis of no relationship between Fertiliser concentration and Yield. - the \(R^2\) value is 0.92. Hence, 92 percent of the variation in Yield is explained by its relationship to Fertiliser.
Arguably, the confidence intervals associated with each of the estimates is more useful than the p-values. P-values purely indicate the probability of detecting an effect if an effect occurs. They do not provide a measure of the magnitude (or importance) of the effect. The slope is the point estimate of the magnitude of the effect. Confidence intervals then provide insights into the broader range of effects. For example, in addition to describing the estimated effect size (slope), we could discuss the implications of slope with a magnitude as low as the lower confidence limit or as high as the upper confidence limit.
Conclusions:
- we are 95% confident that an interval the size of (0.62, 1) would include the true mean slope.
Each modelling routine (and package) may store and present its results in a different structure and format. In an attempt to unify the output, the broom package includes a tidy() method that produces more consistent outputs (structure and names) across modelling routines.
# A tibble: 2 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 51.9 13.0 4.00 0.00394 22.0 81.9
2 FERTILIZER 0.811 0.0837 9.70 0.0000107 0.618 1.00
We can pipe this to kable if we want more formatted output.
| Parameter | Coefficient | SE | 95% CI | t(8) | p |
|---|---|---|---|---|---|
| (Intercept) | 51.93 | 12.98 | (22.00, 81.86) | 4.00 | 0.004 |
| FERTILIZER | 0.81 | 0.08 | (0.62, 1.00) | 9.70 | < .001 |
For HTML output, the sjPlot package has a method for producing comparable and nicely formatted outputs across different modelling routines.
Call:
lm(formula = YIELD ~ center(FERTILIZER), data = fert)
Residuals:
Min 1Q Median 3Q Max
-22.79 -11.07 -5.00 12.00 29.79
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 163.50000 6.00813 27.213 3.58e-09 ***
center(FERTILIZER) 0.81139 0.08367 9.697 1.07e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 19 on 8 degrees of freedom
Multiple R-squared: 0.9216, Adjusted R-squared: 0.9118
F-statistic: 94.04 on 1 and 8 DF, p-value: 1.067e-05
Conclusions:
- the estimated y-intercept (value of Yield when Fertiliser level is 0) is 163.5. Note, as a Fertiliser value of 0 is outside the range of the collected data, this intercept does not really have any useful interpretation (unless we are happy to extrapolate). If we had centred the Fertiliser predictor (either before modelling or as part of the model -
lm(YIELD~center(FERTILIZER), data=fert)), then the y-intercept would have had a sensible interpretation. It would have been the expected Yield at the average Fertiliser level. Nevertheless, either way, the hypothesis test associated with the intercept is rarely of any value. - the estimated slope is 0.81. For every one unit increase in Fertiliser concentration, Yield is expected to increase by 0.81 units.
- the estimated slope divided by the estimated uncertainty in the slope (Std. Error) gives a t-value of
r round(summary(fert_lm1)$coefficients[2, 3],2). If we compare that to a t-distribution for 8, we get a p-value < 0.05. Hence we would reject the null hypothesis of no relationship between Fertiliser concentration and Yield. - the \(R^2\) value is 0.92. Hence, 92 percent of the variation in Yield is explained by its relationship to Fertiliser.
Arguably, the confidence intervals associated with each of the estimates is more useful than the p-values. P-values purely indicate the probability of detecting an effect if an effect occurs. They do not provide a measure of the magnitude (or importance) of the effect. The slope is the point estimate of the magnitude of the effect. Confidence intervals then provide insights into the broader range of effects. For example, in addition to describing the estimated effect size (slope), we could discuss the implications of slope with a magnitude as low as the lower confidence limit or as high as the upper confidence limit.
2.5 % 97.5 %
(Intercept) 149.6452370 177.354763
center(FERTILIZER) 0.6184496 1.004338
Conclusions:
- we are 95% confident that an interval the size of (0.62, 1) would include the true mean slope.
Each modelling routine (and package) may store and present its results in a different structure and format. In an attempt to unify the output, the broom package includes a tidy() method that produces more consistent outputs (structure and names) across modelling routines.
# A tibble: 2 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 163. 6.01 27.2 3.58e-9 150. 177.
2 center(FERTILIZER) 0.811 0.0837 9.70 1.07e-5 0.618 1.00
We can pipe this to kable if we want more formatted output.
| Parameter | Coefficient | SE | 95% CI | t(8) | p |
|---|---|---|---|---|---|
| (Intercept) | 163.50 | 6.01 | (149.65, 177.35) | 27.21 | < .001 |
| center(FERTILIZER) | 0.81 | 0.08 | (0.62, 1.00) | 9.70 | < .001 |
For HTML output, the sjPlot package has a method for producing comparable and nicely formatted outputs across different modelling routines.
# warning this is only appropriate for html output
fert_lm1 |> sjPlot::tab_model(show.se = TRUE, show.aic = TRUE)| YIELD | ||||
| Predictors | Estimates | std. Error | CI | p |
| (Intercept) | 163.50 | 6.01 | 149.65 – 177.35 | <0.001 |
| center(FERTILIZER) | 0.81 | 0.08 | 0.62 – 1.00 | <0.001 |
| Observations | 10 | |||
| R2 / R2 adjusted | 0.922 / 0.912 | |||
| AIC | 91.035 | |||
Family: gaussian ( identity )
Formula: YIELD ~ FERTILIZER
Data: fert
AIC BIC logLik deviance df.resid
91.0 91.9 -42.5 85.0 7
Dispersion estimate for gaussian family (sigma^2): 289
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 51.93332 11.60880 4.474 7.69e-06 ***
FERTILIZER 0.81139 0.07484 10.842 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusions:
- the estimated y-intercept (value of Yield when Fertiliser level is 0) is 51.93, 0.81. Note, as a Fertiliser value of 0 is outside the range of the collected data, this intercept does not really have any useful interpretation (unless we are happy to extrapolate). If we had centred the Fertiliser predictor (either before modelling or as part of the model -
glmmTMB(YIELD~scale(FERTILIZER,scale=FALSE), data=fert)), then the y-intercept would have had a sensible interpretation. It would have been the expected Yield at the average Fertiliser level. Nevertheless, either way, the hypothesis test associated with the intercept is rarely of any value. - the estimated slope is . For every one unit increase in Fertiliser concentration, Yield is expected to increase by units.
- the estimated slope divided by the estimated uncertainty in the slope (Std. Error) gives a t-value of 10.84. If we compare that to a t-distribution for 7, we get a p-value < 0.05. Hence we would reject the null hypothesis of no relationship between Fertiliser concentration and Yield.
- the \(R^2\) value is 0.93. Hence, 93 percent of the variation in Yield is explained by its relationship to Fertiliser.
Arguably, the confidence intervals associated with each of the estimates is more useful than the p-values. P-values purely indicate the probability of detecting an effect if an effect occurs. They do not provide a measure of the magnitude (or importance) of the effect. The slope is the point estimate of the magnitude of the effect. Confidence intervals then provide insights into the broader range of effects. For example, in addition to describing the estimated effect size (slope), we could discuss the implications of slope with a magnitude as low as the lower confidence limit or as high as the upper confidence limit.
2.5 % 97.5 % Estimate
(Intercept) 29.180483 74.6861485 51.933316
FERTILIZER 0.664716 0.9580721 0.811394
- we are 95% confident that an interval the size of (0.66, 0.96, 0.81) would include the true mean slope.
Each modelling routine (and package) may store and present its results in a different structure and format. In an attempt to unify the output, the broom package includes a tidy() method that produces more consistent outputs (structure and names) across modelling routines.
# A tibble: 2 × 9
effect component term estimate std.error statistic p.value conf.low
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed cond (Intercept) 51.9 11.6 4.47 7.69e- 6 29.2
2 fixed cond FERTILIZER 0.811 0.0748 10.8 2.17e-27 0.665
# ℹ 1 more variable: conf.high <dbl>
We can pipe this to kable if we want more formatted output.
# Fixed Effects
Parameter | Coefficient | SE | 95% CI | z | p
-------------------------------------------------------------------
(Intercept) | 51.93 | 11.61 | [29.18, 74.69] | 4.47 | < .001
FERTILIZER | 0.81 | 0.07 | [ 0.66, 0.96] | 10.84 | < .001
# Dispersion
Parameter | Coefficient | 95% CI
------------------------------------------
(Intercept) | 16.99 | [10.96, 26.34]
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald z-distribution approximation.
Random effect variances not available. Returned R2 does not account for random effects.
# Indices of model performance
AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | RMSE | Sigma
--------------------------------------------------------------------
91.035 | 95.035 | 91.943 | | 1.000 | 16.994 | 16.994
For HTML output, the sjPlot package has a method for producing comparable and nicely formatted outputs across different modelling routines.
# warning this is only appropriate for html output
fert_glmmTMB |> sjPlot::tab_model(show.se = TRUE, show.aic = TRUE)Random effect variances not available. Returned R2 does not account for random effects.
| YIELD | ||||
| Predictors | Estimates | std. Error | CI | p |
| (Intercept) | 51.93 | 11.61 | 29.18 – 74.69 | <0.001 |
| FERTILIZER | 0.81 | 0.07 | 0.66 – 0.96 | <0.001 |
| Observations | 10 | |||
| R2 conditional / R2 marginal | NA / 1.000 | |||
| AIC | 91.035 | |||
Family: gaussian ( identity )
Formula: YIELD ~ center(FERTILIZER)
Data: fert
AIC BIC logLik deviance df.resid
91.0 91.9 -42.5 85.0 7
Dispersion estimate for gaussian family (sigma^2): 289
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 163.50000 5.37383 30.43 <2e-16 ***
center(FERTILIZER) 0.81139 0.07484 10.84 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusions:
- the estimated y-intercept (value of Yield when Fertiliser level is 0) is 163.5, 0.81. Note, as a Fertiliser value of 0 is outside the range of the collected data, this intercept does not really have any useful interpretation (unless we are happy to extrapolate). If we had centred the Fertiliser predictor (either before modelling or as part of the model -
glmmTMB(YIELD~center(FERTILIZER), data=fert)), then the y-intercept would have had a sensible interpretation. It would have been the expected Yield at the average Fertiliser level. Nevertheless, either way, the hypothesis test associated with the intercept is rarely of any value. - the estimated slope is . For every one unit increase in Fertiliser concentration, Yield is expected to increase by units.
- the estimated slope divided by the estimated uncertainty in the slope (Std. Error) gives a t-value of 10.84. If we compare that to a t-distribution for 7, we get a p-value < 0.05. Hence we would reject the null hypothesis of no relationship between Fertiliser concentration and Yield.
- the \(R^2\) value is 0.93. Hence, 93 percent of the variation in Yield is explained by its relationship to Fertiliser.
Arguably, the confidence intervals associated with each of the estimates is more useful than the p-values. P-values purely indicate the probability of detecting an effect if an effect occurs. They do not provide a measure of the magnitude (or importance) of the effect. The slope is the point estimate of the magnitude of the effect. Confidence intervals then provide insights into the broader range of effects. For example, in addition to describing the estimated effect size (slope), we could discuss the implications of slope with a magnitude as low as the lower confidence limit or as high as the upper confidence limit.
2.5 % 97.5 % Estimate
(Intercept) 152.9674858 174.032516 163.500001
center(FERTILIZER) 0.6647159 0.958072 0.811394
- we are 95% confident that an interval the size of (0.66, 0.96, 0.81) would include the true mean slope.
Each modelling routine (and package) may store and present its results in a different structure and format. In an attempt to unify the output, the broom package includes a tidy() method that produces more consistent outputs (structure and names) across modelling routines.
# A tibble: 2 × 9
effect component term estimate std.error statistic p.value conf.low
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 fixed cond (Intercept) 164. 5.37 30.4 2.55e-203 153.
2 fixed cond center(FERTI… 0.811 0.0748 10.8 2.17e- 27 0.665
# ℹ 1 more variable: conf.high <dbl>
We can pipe this to kable if we want more formatted output.
# Fixed Effects
Parameter | Coefficient | SE | 95% CI | z | p
---------------------------------------------------------------------------
(Intercept) | 163.50 | 5.37 | [152.97, 174.03] | 30.43 | < .001
center(FERTILIZER) | 0.81 | 0.07 | [ 0.66, 0.96] | 10.84 | < .001
# Dispersion
Parameter | Coefficient | 95% CI
------------------------------------------
(Intercept) | 16.99 | [10.96, 26.34]
Uncertainty intervals (equal-tailed) and p-values (two-tailed) computed
using a Wald z-distribution approximation.
Random effect variances not available. Returned R2 does not account for random effects.
# Indices of model performance
AIC | AICc | BIC | R2 (cond.) | R2 (marg.) | RMSE | Sigma
--------------------------------------------------------------------
91.035 | 95.035 | 91.943 | | 1.000 | 16.994 | 16.994
For HTML output, the sjPlot package has a method for producing comparable and nicely formatted outputs across different modelling routines.
# warning this is only appropriate for html output
fert_glmmTMB1 |> sjPlot::tab_model(show.se = TRUE, show.aic = TRUE)Random effect variances not available. Returned R2 does not account for random effects.
| YIELD | ||||
| Predictors | Estimates | std. Error | CI | p |
| (Intercept) | 163.50 | 5.37 | 152.97 – 174.03 | <0.001 |
| center(FERTILIZER) | 0.81 | 0.07 | 0.66 – 0.96 | <0.001 |
| Observations | 10 | |||
| R2 conditional / R2 marginal | NA / 1.000 | |||
| AIC | 91.035 | |||
9 Predictions
For simple models prediction is essentially taking the model formula complete with parameter (coefficient) estimates and solving for new values of the predictor. To explore this, we will use the fitted model to predict Yield for a Fertiliser concentration of 110.
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110)
## using the predict function
fert_lm |> predict(newdata = newdata) 1
141.1867
fit lwr upr
1 141.1867 126.3506 156.0227
Conclusions:
- we expect (predict) a Yield of 141.19 associated with a Fertiliser level of 110.
- confidence intervals represent the interval in which we are 95% confident the true Mean value of the population falls.
- prediction intervals (not show, but described for comparison) represent the interval in which we are 95% confident a single observation drawn from the population mean will fall.
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110)
## Establish an appropriate model matrix
Xmat <- model.matrix(~FERTILIZER, data = newdata)
## Perform matrix multiplication
(pred <- coef(fert_lm) %*% t(Xmat)) 1
[1,] 141.1867
## Calculate the standard error
se <- sqrt(diag(Xmat %*% vcov(fert_lm) %*% t(Xmat)))
## Calculate the confidence intervals
as.numeric(pred) + outer(se, qt(df = df.residual(fert_lm), c(0.025, 0.975))) [,1] [,2]
1 126.3506 156.0227
The emmeans package has a set of routines for estimating marginal means from fitted models.
FERTILIZER emmean SE df lower.CL upper.CL
110 141 6.43 8 126 156
Confidence level used: 0.95
Note, the above outputs are rounded only for the purpose of constructing the printed table on screen. If we were to capture the output of the emmeans() function, the values would not be rounded.
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 6.43 | [126.35, 156.02]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
## OR
newdata <- fert_lm |> insight::get_datagrid(at = list(FERTILIZER = 110))
newdata |> estimate_expectation()Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 6.43 | [126.35, 156.02]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110)
## using the predict function
fert_lm1 |> predict(newdata = newdata) 1
141.1867
fit lwr upr
1 141.1867 126.3506 156.0227
Conclusions:
- we expect (predict) a Yield of 141.19 associated with a Fertiliser level of 110.
- confidence intervals represent the interval in which we are 95% confident the true Mean value of the population falls.
- prediction intervals (not show, but described for comparison) represent the interval in which we are 95% confident a single observation drawn from the population mean will fall.
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110 - mean(fert$FERTILIZER))
## Establish an appropriate model matrix
Xmat <- model.matrix(~FERTILIZER, data = newdata)
## Perform matrix multiplication
(pred <- coef(fert_lm1) %*% t(Xmat)) 1
[1,] 141.1867
## Calculate the standard error
se <- sqrt(diag(Xmat %*% vcov(fert_lm1) %*% t(Xmat)))
## Calculate the confidence intervals
as.numeric(pred) + outer(se, qt(df = df.residual(fert_lm1), c(0.025, 0.975))) [,1] [,2]
1 126.3506 156.0227
The emmeans package has a set of routines for estimating marginal means from fitted models.
FERTILIZER emmean SE df lower.CL upper.CL
110 141 6.43 8 126 156
Confidence level used: 0.95
Note, the above outputs are rounded only for the purpose of constructing the printed table on screen. If we were to capture the output of the emmeans() function, the values would not be rounded.
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 6.43 | [126.35, 156.02]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
## OR
newdata <- fert_lm1 |> insight::get_datagrid(at = list(FERTILIZER = 110))
newdata |> estimate_expectation()Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 6.43 | [126.35, 156.02]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110)
## using the predict function
fert_glmmTMB |> predict(newdata = newdata)[1] 141.1867
$fit
[1] 141.1867
$se.fit
[1] 5.754434
Conclusions:
- we expect (predict) a Yield of 141.19 associated with a Fertiliser level of 110.
- confidence intervals represent the interval in which we are 95% confident the true Mean value of the population falls.
- prediction intervals (not show, but described for comparison) represent the interval in which we are 95% confident a single observation drawn from the population mean will fall.
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110)
## Establish an appropriate model matrix
Xmat <- model.matrix(~FERTILIZER, data = newdata)
## Perform matrix multiplication
(pred <- fixef(fert_glmmTMB)[[1]] %*% t(Xmat)) 1
[1,] 141.1867
## Calculate the standard error
se <- sqrt(diag(Xmat %*% vcov(fert_glmmTMB)[[1]] %*% t(Xmat)))
## Calculate the confidence intervals
as.numeric(pred) + outer(se, qt(df = df.residual(fert_glmmTMB), c(0.025, 0.975))) [,1] [,2]
1 127.5796 154.7937
The emmeans package has a set of routines for estimating marginal means from fitted models.
## using emmeans
newdata <- list(FERTILIZER = 110)
fert_glmmTMB |> emmeans(~FERTILIZER, at = newdata) FERTILIZER emmean SE df lower.CL upper.CL
110 141 5.75 7 128 155
Confidence level used: 0.95
Note, the above outputs are rounded only for the purpose of constructing the printed table on screen. If we were to capture the output of the emmeans() function, the values would not be rounded.
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 5.75 | [129.91, 152.47]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 5.75 | [129.91, 152.47]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
## OR
newdata <- fert_glmmTMB |> insight::get_datagrid(at = list(FERTILIZER = 110))
newdata |> estimate_expectation()Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 5.75 | [129.91, 152.47]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110)
## using the predict function
fert_glmmTMB1 |> predict(newdata = newdata)[1] 141.1867
$fit
[1] 141.1867
$se.fit
[1] 5.754434
Conclusions:
- we expect (predict) a Yield of 141.19 associated with a Fertiliser level of 110.
- confidence intervals represent the interval in which we are 95% confident the true Mean value of the population falls.
- prediction intervals (not show, but described for comparison) represent the interval in which we are 95% confident a single observation drawn from the population mean will fall.
## establish a data set that defines the new data to predict against
newdata <- data.frame(FERTILIZER = 110 - mean(fert$FERTILIZER))
## Establish an appropriate model matrix
Xmat <- model.matrix(~FERTILIZER, data = newdata)
## Perform matrix multiplication
(pred <- fixef(fert_glmmTMB1)[[1]] %*% t(Xmat)) 1
[1,] 141.1867
## Calculate the standard error
se <- sqrt(diag(Xmat %*% vcov(fert_glmmTMB1)[[1]] %*% t(Xmat)))
## Calculate the confidence intervals
as.numeric(pred) + outer(se, qt(df = df.residual(fert_glmmTMB1), c(0.025, 0.975))) [,1] [,2]
1 127.5796 154.7937
The emmeans package has a set of routines for estimating marginal means from fitted models.
## using emmeans
newdata <- list(FERTILIZER = 110)
fert_glmmTMB1 |> emmeans(~FERTILIZER, at = newdata) FERTILIZER emmean SE df lower.CL upper.CL
110 141 5.75 7 128 155
Confidence level used: 0.95
Note, the above outputs are rounded only for the purpose of constructing the printed table on screen. If we were to capture the output of the emmeans() function, the values would not be rounded.
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 5.75 | [129.91, 152.47]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 5.75 | [129.91, 152.47]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
## OR
newdata <- fert_glmmTMB1 |> insight::get_datagrid(at = list(FERTILIZER = 110))
newdata |> estimate_expectation()Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
110.00 | 141.19 | 5.75 | [129.91, 152.47]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(110)
10 Additional analyses
## testing a specific hypothesis
## Probabiliy of getting our estimate if slope was 1
fert_lm |> multcomp::glht(linfct = c("FERTILIZER == 1")) |> summary()
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = YIELD ~ FERTILIZER, data = fert)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
FERTILIZER == 1 0.81139 0.08367 -2.254 0.0542 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
## Cant ask probability that the slope is equal to something in frequentist
## If we wanted to know the probability that the slope was greater than
## 1, the closest we could get is
fert_lm |> multcomp::glht(linfct = c("FERTILIZER >= 0.9")) |> summary()
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = YIELD ~ FERTILIZER, data = fert)
Linear Hypotheses:
Estimate Std. Error t value Pr(<t)
FERTILIZER >= 0.9 0.81139 0.08367 -1.059 0.16
(Adjusted p values reported -- single-step method)
## ## testing a specific hypothesis
## ## Probabiliy of getting our estimate if slope was 1
## fert_lm |> brms::hypothesis("FERTILIZER = 1")
## ## Cant ask probability that the slope is equal to something in frequentist
## ## If we wanted to know the probability that the slope was greater than
## ## 1, the closest we could get is
## fert_lm |> brms::hypothesis("FERTILIZER > 0.9")We could explore the expected change in Yield associated with a specific increase in Fertiliser. For example, how much do we expect Yield to increase when increasing Fertiliser concentration from 100 to 200.
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.97 8 196 233
100 133 6.78 8 117 149
Confidence level used: 0.95
contrast estimate SE df t.ratio p.value
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 9.697 <.0001
contrast estimate SE df lower.CL upper.CL
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 61.8 100
Confidence level used: 0.95
contrast estimate SE df lower.CL upper.CL t.ratio
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 61.8 100 9.697
p.value
<.0001
Confidence level used: 0.95
$emmeans
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.97 8 196 233
100 133 6.78 8 117 149
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 9.697 <.0001
$emmeans
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.97 8 196 233
100 133 6.78 8 117 149
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 61.8 100
Confidence level used: 0.95
## What if we wanted to calculate the percentage increase in yield associated with this change
## Lets consider what a percentage change is:
## If a value changed from 15 to 20, what percentage change is this?
## 20/15 = 1.333 therefore, a 33.3% increase
## But when working with model parameters, can only work with + -, not * and /
## To do this in a frequentist model, we need to use a log law trick
## log(A-B) = log(A)/log(B)
## So what we need to do is:
## - take out cell means
## - log them
## - subtract them
## - them back-transform (exp)
fert_lm |>
emmeans(~FERTILIZER, at = newdata) |>
regrid(transform = "log") |>
pairs() |>
summary(infer = TRUE) |>
mutate(across(c(estimate, lower.CL, upper.CL), exp)) |>
as.data.frame() contrast estimate SE df lower.CL upper.CL
1 FERTILIZER200 - FERTILIZER100 1.609737 0.05094673 8 1.431306 1.810412
t.ratio p.value
1 9.344485 1.404817e-05
## And if we wanted to express it as a decline from FERTILIZER 200 to 100
fert_lm |>
emmeans(~FERTILIZER, at = newdata) |>
regrid(transform = "log") |>
pairs(reverse = TRUE) |>
summary(infer = TRUE) |>
mutate(across(c(estimate, lower.CL, upper.CL), exp)) |>
as.data.frame() contrast estimate SE df lower.CL upper.CL
1 FERTILIZER100 - FERTILIZER200 0.6212194 0.05094673 8 0.5523605 0.6986624
t.ratio p.value
1 -9.344485 1.404817e-05
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
200.00 | 214.21 | 7.97 | [195.84, 232.58]
100.00 | 133.07 | 6.78 | [117.44, 148.70]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(200, 100)
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(8) | p
-----------------------------------------------------------------------------------
FERTILIZER200 | FERTILIZER100 | 81.14 | [61.84, 100.43] | 8.37 | 9.70 | < .001
Marginal contrasts estimated at FERTILIZER
p-value adjustment method: Holm (1979)
## OR
fert_grid <- fert_lm |>
insight::get_datagrid(at = list(FERTILIZER = c(200, 100)))
fert_lm |> estimate_contrasts("FERTILIZER", fert_grid)Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(8) | p
-----------------------------------------------------------------------------------
FERTILIZER200 | FERTILIZER100 | 81.14 | [61.84, 100.43] | 8.37 | 9.70 | < .001
Marginal contrasts estimated at FERTILIZER
p-value adjustment method: Holm (1979)
## testing a specific hypothesis
## Probabiliy of getting our estimate if slope was 1
fert_lm1 |> multcomp::glht(linfct = c("`center(FERTILIZER)` == 1")) |> summary()
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = YIELD ~ center(FERTILIZER), data = fert)
Linear Hypotheses:
Estimate Std. Error t value Pr(>|t|)
center(FERTILIZER) == 1 0.81139 0.08367 -2.254 0.0542 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
## Cant ask probability that the slope is equal to something in frequentist
## If we wanted to know the probability that the slope was greater than
## 1, the closest we could get is
fert_lm1 |> multcomp::glht(linfct = c("`center(FERTILIZER)` >= 0.9")) |> summary()
Simultaneous Tests for General Linear Hypotheses
Fit: lm(formula = YIELD ~ center(FERTILIZER), data = fert)
Linear Hypotheses:
Estimate Std. Error t value Pr(<t)
center(FERTILIZER) >= 0.9 0.81139 0.08367 -1.059 0.16
(Adjusted p values reported -- single-step method)
## ## testing a specific hypothesis
## ## Probabiliy of getting our estimate if slope was 1
## fert_lm1 |> brms::hypothesis("FERTILIZER = 1")
## ## Cant ask probability that the slope is equal to something in frequentist
## ## If we wanted to know the probability that the slope was greater than
## ## 1, the closest we could get is
## fert_lm1 |> brms::hypothesis("FERTILIZER > 0.9")We could explore the expected change in Yield associated with a specific increase in Fertiliser. For example, how much do we expect Yield to increase when increasing Fertiliser concentration from 100 to 200.
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.97 8 196 233
100 133 6.78 8 117 149
Confidence level used: 0.95
contrast estimate SE df t.ratio p.value
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 9.697 <.0001
contrast estimate SE df lower.CL upper.CL
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 61.8 100
Confidence level used: 0.95
contrast estimate SE df lower.CL upper.CL t.ratio
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 61.8 100 9.697
p.value
<.0001
Confidence level used: 0.95
$emmeans
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.97 8 196 233
100 133 6.78 8 117 149
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 9.697 <.0001
$emmeans
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.97 8 196 233
100 133 6.78 8 117 149
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
FERTILIZER200 - FERTILIZER100 81.1 8.37 8 61.8 100
Confidence level used: 0.95
## What if we wanted to calculate the percentage increase in yield associated with this change
## Lets consider what a percentage change is:
## If a value changed from 15 to 20, what percentage change is this?
## 20/15 = 1.333 therefore, a 33.3% increase
## But when working with model parameters, can only work with + -, not * and /
## To do this in a frequentist model, we need to use a log law trick
## log(A-B) = log(A)/log(B)
## So what we need to do is:
## - take out cell means
## - log them
## - subtract them
## - them back-transform (exp)
fert_lm1 |>
emmeans(~FERTILIZER, at = newdata) |>
regrid(transform = "log") |>
pairs() |>
summary(infer = TRUE) |>
mutate(across(c(estimate, lower.CL, upper.CL), exp)) |>
as.data.frame() contrast estimate SE df lower.CL upper.CL
1 FERTILIZER200 - FERTILIZER100 1.609737 0.05094673 8 1.431306 1.810412
t.ratio p.value
1 9.344485 1.404817e-05
## And if we wanted to express it as a decline from FERTILIZER 200 to 100
fert_lm1 |>
emmeans(~FERTILIZER, at = newdata) |>
regrid(transform = "log") |>
pairs(reverse = TRUE) |>
summary(infer = TRUE) |>
mutate(across(c(estimate, lower.CL, upper.CL), exp)) |>
as.data.frame() contrast estimate SE df lower.CL upper.CL
1 FERTILIZER100 - FERTILIZER200 0.6212194 0.05094673 8 0.5523605 0.6986624
t.ratio p.value
1 -9.344485 1.404817e-05
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
200.00 | 214.21 | 7.97 | [195.84, 232.58]
100.00 | 133.07 | 6.78 | [117.44, 148.70]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(200, 100)
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(8) | p
-----------------------------------------------------------------------------------
FERTILIZER200 | FERTILIZER100 | 81.14 | [61.84, 100.43] | 8.37 | 9.70 | < .001
Marginal contrasts estimated at FERTILIZER
p-value adjustment method: Holm (1979)
## OR
fert_grid <- fert_lm1 |>
insight::get_datagrid(at = list(FERTILIZER = c(200, 100)))
fert_lm1 |> estimate_contrasts("FERTILIZER", fert_grid)Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(8) | p
-----------------------------------------------------------------------------------
FERTILIZER200 | FERTILIZER100 | 81.14 | [61.84, 100.43] | 8.37 | 9.70 | < .001
Marginal contrasts estimated at FERTILIZER
p-value adjustment method: Holm (1979)
## testing a specific hypothesis
## Probabiliy of getting our estimate if slope was 1
fert_glmmTMB |> multcomp::glht(linfct = c("FERTILIZER == 1")) |> summary()
Simultaneous Tests for General Linear Hypotheses
Fit: glmmTMB::glmmTMB(formula = YIELD ~ FERTILIZER, data = fert, ziformula = ~0,
dispformula = ~1)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
FERTILIZER == 1 0.81139 0.07484 -2.52 0.0117 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
## Cant ask probability that the slope is equal to something in frequentist
## If we wanted to know the probability that the slope was greater than
## 1, the closest we could get is
fert_glmmTMB |> multcomp::glht(linfct = c("FERTILIZER >= 0.9")) |> summary()
Simultaneous Tests for General Linear Hypotheses
Fit: glmmTMB::glmmTMB(formula = YIELD ~ FERTILIZER, data = fert, ziformula = ~0,
dispformula = ~1)
Linear Hypotheses:
Estimate Std. Error z value Pr(<z)
FERTILIZER >= 0.9 0.81139 0.07484 -1.184 0.118
(Adjusted p values reported -- single-step method)
## testing a specific hypothesis
## Probabiliy of getting our estimate if slope was 1
fert_lm |> brms::hypothesis("FERTILIZER = 1")Error in as.data.frame.default(x): cannot coerce class '"lm"' to a data.frame
## Cant ask probability that the slope is equal to something in frequentist
## If we wanted to know the probability that the slope was greater than
## 1, the closest we could get is
fert_lm |> brms::hypothesis("FERTILIZER > 0.9")Error in as.data.frame.default(x): cannot coerce class '"lm"' to a data.frame
We could explore the expected change in Yield associated with a specific increase in Fertiliser. For example, how much do we expect Yield to increase when increasing Fertiliser concentration from 100 to 200.
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.12 7 197 231
100 133 6.06 7 119 147
Confidence level used: 0.95
contrast estimate SE df t.ratio p.value
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 10.842 <.0001
contrast estimate SE df lower.CL upper.CL
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 63.4 98.8
Confidence level used: 0.95
contrast estimate SE df lower.CL upper.CL t.ratio
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 63.4 98.8 10.842
p.value
<.0001
Confidence level used: 0.95
$emmeans
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.12 7 197 231
100 133 6.06 7 119 147
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 10.842 <.0001
## or with confidence intervals
fert_glmmTMB |> emmeans(pairwise~FERTILIZER, at = newdata) |> confint()$emmeans
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.12 7 197 231
100 133 6.06 7 119 147
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 63.4 98.8
Confidence level used: 0.95
## What if we wanted to calculate the percentage increase in yield associated with this change
## Lets consider what a percentage change is:
## If a value changed from 15 to 20, what percentage change is this?
## 20/15 = 1.333 therefore, a 33.3% increase
## But when working with model parameters, can only work with + -, not * and /
## To do this in a frequentist model, we need to use a log law trick
## log(A-B) = log(A)/log(B)
## So what we need to do is:
## - take out cell means
## - log them
## - subtract them
## - them back-transform (exp)
fert_glmmTMB |>
emmeans(~FERTILIZER, at = newdata) |>
regrid(transform = "log") |>
pairs() |>
summary(infer = TRUE) |>
mutate(across(c(estimate, lower.CL, upper.CL), exp)) |>
as.data.frame() contrast estimate SE df lower.CL upper.CL
1 FERTILIZER200 - FERTILIZER100 1.609737 0.04556814 7 1.445304 1.792879
t.ratio p.value
1 10.44745 1.602e-05
## And if we wanted to express it as a decline from FERTILIZER 200 to 100
fert_glmmTMB |>
emmeans(~FERTILIZER, at = newdata) |>
regrid(transform = "log") |>
pairs(reverse = TRUE) |>
summary(infer = TRUE) |>
mutate(across(c(estimate, lower.CL, upper.CL), exp)) |>
as.data.frame() contrast estimate SE df lower.CL upper.CL
1 FERTILIZER100 - FERTILIZER200 0.6212194 0.04556814 7 0.5577622 0.6918961
t.ratio p.value
1 -10.44745 1.602e-05
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
200.00 | 214.21 | 7.12 | [200.25, 228.18]
100.00 | 133.07 | 6.06 | [121.19, 144.95]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(200, 100)
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(7) | p
-----------------------------------------------------------------------------------
FERTILIZER200 | FERTILIZER100 | 81.14 | [63.44, 98.84] | 7.48 | 10.84 | < .001
Marginal contrasts estimated at FERTILIZER
p-value adjustment method: Holm (1979)
## OR
fert_grid <- fert_glmmTMB |>
insight::get_datagrid(at = list(FERTILIZER = c(200, 100)))
fert_glmmTMB |> estimate_contrasts("FERTILIZER", fert_grid)Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(7) | p
-----------------------------------------------------------------------------------
FERTILIZER200 | FERTILIZER100 | 81.14 | [63.44, 98.84] | 7.48 | 10.84 | < .001
Marginal contrasts estimated at FERTILIZER
p-value adjustment method: Holm (1979)
## testing a specific hypothesis
## Probabiliy of getting our estimate if slope was 1
fert_glmmTMB1 |> multcomp::glht(linfct = c("`center(FERTILIZER)` == 1")) |> summary()
Simultaneous Tests for General Linear Hypotheses
Fit: glmmTMB::glmmTMB(formula = YIELD ~ center(FERTILIZER), data = fert,
ziformula = ~0, dispformula = ~1)
Linear Hypotheses:
Estimate Std. Error z value Pr(>|z|)
center(FERTILIZER) == 1 0.81139 0.07484 -2.52 0.0117 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Adjusted p values reported -- single-step method)
## Cant ask probability that the slope is equal to something in frequentist
## If we wanted to know the probability that the slope was greater than
## 1, the closest we could get is
fert_glmmTMB1 |> multcomp::glht(linfct = c("`center(FERTILIZER)` >= 0.9")) |> summary()
Simultaneous Tests for General Linear Hypotheses
Fit: glmmTMB::glmmTMB(formula = YIELD ~ center(FERTILIZER), data = fert,
ziformula = ~0, dispformula = ~1)
Linear Hypotheses:
Estimate Std. Error z value Pr(<z)
center(FERTILIZER) >= 0.9 0.81139 0.07484 -1.184 0.118
(Adjusted p values reported -- single-step method)
## testing a specific hypothesis
## Probabiliy of getting our estimate if slope was 1
fert_lm1 |> brms::hypothesis("FERTILIZER = 1")Error in as.data.frame.default(x): cannot coerce class '"lm"' to a data.frame
## Cant ask probability that the slope is equal to something in frequentist
## If we wanted to know the probability that the slope was greater than
## 1, the closest we could get is
fert_lm1 |> brms::hypothesis("FERTILIZER > 0.9")Error in as.data.frame.default(x): cannot coerce class '"lm"' to a data.frame
We could explore the expected change in Yield associated with a specific increase in Fertiliser. For example, how much do we expect Yield to increase when increasing Fertiliser concentration from 100 to 200.
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.12 7 197 231
100 133 6.06 7 119 147
Confidence level used: 0.95
contrast estimate SE df t.ratio p.value
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 10.842 <.0001
contrast estimate SE df lower.CL upper.CL
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 63.4 98.8
Confidence level used: 0.95
contrast estimate SE df lower.CL upper.CL t.ratio
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 63.4 98.8 10.842
p.value
<.0001
Confidence level used: 0.95
$emmeans
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.12 7 197 231
100 133 6.06 7 119 147
Confidence level used: 0.95
$contrasts
contrast estimate SE df t.ratio p.value
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 10.842 <.0001
## or with confidence intervals
fert_glmmTMB1 |> emmeans(pairwise~FERTILIZER, at = newdata) |> confint()$emmeans
FERTILIZER emmean SE df lower.CL upper.CL
200 214 7.12 7 197 231
100 133 6.06 7 119 147
Confidence level used: 0.95
$contrasts
contrast estimate SE df lower.CL upper.CL
FERTILIZER200 - FERTILIZER100 81.1 7.48 7 63.4 98.8
Confidence level used: 0.95
## What if we wanted to calculate the percentage increase in yield associated with this change
## Lets consider what a percentage change is:
## If a value changed from 15 to 20, what percentage change is this?
## 20/15 = 1.333 therefore, a 33.3% increase
## But when working with model parameters, can only work with + -, not * and /
## To do this in a frequentist model, we need to use a log law trick
## log(A-B) = log(A)/log(B)
## So what we need to do is:
## - take out cell means
## - log them
## - subtract them
## - them back-transform (exp)
fert_glmmTMB1 |>
emmeans(~FERTILIZER, at = newdata) |>
regrid(transform = "log") |>
pairs() |>
summary(infer = TRUE) |>
mutate(across(c(estimate, lower.CL, upper.CL), exp)) |>
as.data.frame() contrast estimate SE df lower.CL upper.CL
1 FERTILIZER200 - FERTILIZER100 1.609737 0.04556814 7 1.445304 1.792879
t.ratio p.value
1 10.44745 1.602001e-05
## And if we wanted to express it as a decline from FERTILIZER 200 to 100
fert_glmmTMB1 |>
emmeans(~FERTILIZER, at = newdata) |>
regrid(transform = "log") |>
pairs(reverse = TRUE) |>
summary(infer = TRUE) |>
mutate(across(c(estimate, lower.CL, upper.CL), exp)) |>
as.data.frame() contrast estimate SE df lower.CL upper.CL
1 FERTILIZER100 - FERTILIZER200 0.6212194 0.04556814 7 0.5577623 0.6918961
t.ratio p.value
1 -10.44745 1.602001e-05
Model-based Expectation
FERTILIZER | Predicted | SE | 95% CI
------------------------------------------------
200.00 | 214.21 | 7.12 | [200.25, 228.18]
100.00 | 133.07 | 6.06 | [121.19, 144.96]
Variable predicted: YIELD
Predictors modulated: FERTILIZER = c(200, 100)
Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(7) | p
-----------------------------------------------------------------------------------
FERTILIZER200 | FERTILIZER100 | 81.14 | [63.44, 98.84] | 7.48 | 10.84 | < .001
Marginal contrasts estimated at FERTILIZER
p-value adjustment method: Holm (1979)
## OR
fert_grid <- fert_glmmTMB1 |>
insight::get_datagrid(at = list(FERTILIZER = c(200, 100)))
fert_glmmTMB1 |> estimate_contrasts("FERTILIZER", fert_grid)Marginal Contrasts Analysis
Level1 | Level2 | Difference | 95% CI | SE | t(7) | p
-----------------------------------------------------------------------------------
FERTILIZER200 | FERTILIZER100 | 81.14 | [63.44, 98.84] | 7.48 | 10.84 | < .001
Marginal contrasts estimated at FERTILIZER
p-value adjustment method: Holm (1979)
11 Summary figures
Although there are numerous easy to use routines that will generate partial plots (see above), it is also useful to be able to produce the data behind the figures yourself. That way, you can have more control over the type and style of the figures.
Producing a summary figure is essentially plotting the results of predictions. In the case of a trend, we want a series of predictions associated with a sequence of predictor values. Hence, we establish a prediction grid that contains the sequence of values for the predictor we would like to display.
There are a number of functions we can use to generate different sequences of prediction values. For example, we may want a sequence of values from the lowest to the highest observed predictor value. Alternatively, we may just want to explore predictions at the first, third and fifth quantiles. The following table indicates some of the functions that are helpful for these purposes.
| Function | Values returned |
|---|---|
modelr::seq_range() |
equal increment values from smallest to largest |
Hmisc::smean_sdl() |
mean as well as mean plus/minus one standard deviation |
Hmisc::smedian_hilow() |
median as well as min and max |
Hmisc::smean.cl.normal() |
mean as well as lower and upper 95% confidence interval |
## Using emmeans
fert_grid <- with(fert, list(FERTILIZER = seq_range(FERTILIZER, n = 100)))
## OR
fert_grid <- fert |> data_grid(FERTILIZER = seq_range(FERTILIZER, n = 100))
newdata <- fert_lm |>
emmeans(~FERTILIZER, at = fert_grid) |>
as.data.frame()
newdata <- fert_lm |> emmeans(~FERTILIZER, at = fert_grid) |> as.data.frame()
newdata |> head() FERTILIZER emmean SE df lower.CL upper.CL
25.00000 72.21818 11.16695 8 46.46715 97.96921
27.27273 74.06226 11.00713 8 48.67977 99.44475
29.54545 75.90634 10.84830 8 50.89012 100.92255
31.81818 77.75041 10.69048 8 53.09812 102.40271
34.09091 79.59449 10.53374 8 55.30365 103.88533
36.36364 81.43857 10.37811 8 57.50661 105.37053
Confidence level used: 0.95
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3)), breaks=c(100,150,200,250))+
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1)))+
theme_classic()For more complex models that include additional covariates, the partial plot represents the expected trend between the response and focal predictor holding all other predictors constant. This is akin to plotting the trend between the response and focal predictor after standardising first for the other covariate(s).
When this is the case, it is not always appropriate to overlay a partial plot with raw data (as this data has not been standardised and will not necessarily represent the partial effects). Arguably a more appropriate way to represent the data points, is to calculate standardised version of the points by adding the residuals onto the fitted values associated with each of the observed values of the focal predictor. When there is only a single predictor, as is the case here, the result will be identical to just overlaying the raw data.
We can either do this by leveraging completely off the ggemmeans() function (or ggpredict() if we want conditional predictions).
## Using emmeans
fert_grid <- fert |> data_grid(FERTILIZER = seq_range(FERTILIZER, n = 100))
newdata <- fert_lm |>
emmeans(~FERTILIZER, at = fert_grid) |>
as.data.frame()
newdata <- fert_lm |> emmeans(~FERTILIZER, at = fert_grid) |> as.data.frame()
newdata |> head() FERTILIZER emmean SE df lower.CL upper.CL
25.00000 72.21818 11.16695 8 46.46715 97.96921
27.27273 74.06226 11.00713 8 48.67977 99.44475
29.54545 75.90634 10.84830 8 50.89012 100.92255
31.81818 77.75041 10.69048 8 53.09812 102.40271
34.09091 79.59449 10.53374 8 55.30365 103.88533
36.36364 81.43857 10.37811 8 57.50661 105.37053
Confidence level used: 0.95
## Now generate partial residuals
fitted_grid <- fert
fit <- fert_lm |> emmeans(~FERTILIZER, at = fitted_grid) |> as.data.frame() |>
pull(emmean)
resid_newdata <- fert |>
mutate(Fit = fit,
Resid = resid(fert_lm),
Partial.obs = Fit + Resid)
resid_newdata |> head()# A tibble: 6 × 5
FERTILIZER YIELD Fit Resid Partial.obs
<dbl> <dbl> <dbl> <dbl> <dbl>
1 25 84 72.2 11.8 84
2 50 80 92.5 -12.5 80
3 75 90 113. -22.8 90
4 100 154 133. 20.9 154
5 125 148 153. -5.36 148
6 150 169 174. -4.64 169
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = resid_newdata, aes(y = Partial.obs)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()fert_grid <- fert_lm |>
insight::get_datagrid(at = "FERTILIZER", length = 100)
newdata <- fert_lm |>
estimate_expectation(data = fert_grid) |>
as.data.frame()
newdata |> head() FERTILIZER Predicted SE CI_low CI_high
1 25.00000 72.21818 11.16695 46.46715 97.96921
2 27.27273 74.06226 11.00713 48.67977 99.44475
3 29.54545 75.90634 10.84830 50.89012 100.92255
4 31.81818 77.75041 10.69048 53.09812 102.40271
5 34.09091 79.59449 10.53374 55.30365 103.88533
6 36.36364 81.43857 10.37811 57.50661 105.37053
## Now generate partial residuals
resid_grid <- fert |>
insight::get_data()
resid_newdata <- fert_lm |>
estimate_expectation(data = resid_grid) |>
as.data.frame() |>
mutate(Partial.obs = Predicted + Residuals)
ggplot(newdata, aes(y = Predicted, x = FERTILIZER)) +
geom_point(data = resid_newdata, aes(y = Partial.obs)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()## Using emmeans
newdata <- with(fert, data.frame(FERTILIZER = seq(min(FERTILIZER), max(FERTILIZER), len = 100)))
Xmat <- model.matrix(~FERTILIZER, data = newdata)
coefs <- coef(fert_lm)
preds <- coefs %*% t(Xmat)
se <- sqrt(diag(Xmat %*% vcov(fert_lm) %*% t(Xmat)))
fert_df <- df.residual(fert_lm)
newdata <- newdata |>
mutate(fit = as.vector(preds),
lower = fit - qt(0.975, df = fert_df) * se,
upper = fit + qt(0.975, df = fert_df) * se)
ggplot(newdata, aes(y = fit, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()## Using emmeans
fert_grid1 <- with(fert, list(FERTILIZER = seq_range(FERTILIZER, n = 100)))
## OR
fert_grid <- fert |> data_grid(FERTILIZER = seq_range(FERTILIZER, n = 100))
newdata <- fert_lm1 |>
emmeans(~FERTILIZER, at = fert_grid) |>
as.data.frame()
newdata <- fert_lm1 |> emmeans(~FERTILIZER, at = fert_grid) |> as.data.frame()
newdata |> head() FERTILIZER emmean SE df lower.CL upper.CL
25.00000 72.21818 11.16695 8 46.46715 97.96921
27.27273 74.06226 11.00713 8 48.67977 99.44475
29.54545 75.90634 10.84830 8 50.89012 100.92255
31.81818 77.75041 10.69048 8 53.09812 102.40271
34.09091 79.59449 10.53374 8 55.30365 103.88533
36.36364 81.43857 10.37811 8 57.50661 105.37053
Confidence level used: 0.95
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3)), breaks=c(100,150,200,250))+
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1)))+
theme_classic()For more complex models that include additional covariates, the partial plot represents the expected trend between the response and focal predictor holding all other predictors constant. This is akin to plotting the trend between the response and focal predictor after standardising first for the other covariate(s).
When this is the case, it is not always appropriate to overlay a partial plot with raw data (as this data has not been standardised and will not necessarily represent the partial effects). Arguably a more appropriate way to represent the data points, is to calculate standardised version of the points by adding the residuals onto the fitted values associated with each of the observed values of the focal predictor. When there is only a single predictor, as is the case here, the result will be identical to just overlaying the raw data.
We can either do this by leveraging completely off the ggemmeans() function (or ggpredict() if we want conditional predictions).
## Using emmeans
fert_grid <- fert |> data_grid(FERTILIZER = seq_range(FERTILIZER, n = 100))
newdata <- fert_lm1 |>
emmeans(~FERTILIZER, at = fert_grid) |>
as.data.frame()
newdata <- fert_lm1 |> emmeans(~FERTILIZER, at = fert_grid) |> as.data.frame()
newdata |> head() FERTILIZER emmean SE df lower.CL upper.CL
25.00000 72.21818 11.16695 8 46.46715 97.96921
27.27273 74.06226 11.00713 8 48.67977 99.44475
29.54545 75.90634 10.84830 8 50.89012 100.92255
31.81818 77.75041 10.69048 8 53.09812 102.40271
34.09091 79.59449 10.53374 8 55.30365 103.88533
36.36364 81.43857 10.37811 8 57.50661 105.37053
Confidence level used: 0.95
## Now generate partial residuals
fitted_grid <- fert
fit <- fert_lm1 |> emmeans(~FERTILIZER, at = fitted_grid) |> as.data.frame() |>
pull(emmean)
resid_newdata <- fert |>
mutate(Fit = fit,
Resid = resid(fert_lm1),
Partial.obs = Fit + Resid)
resid_newdata |> head()# A tibble: 6 × 5
FERTILIZER YIELD Fit Resid Partial.obs
<dbl> <dbl> <dbl> <dbl> <dbl>
1 25 84 72.2 11.8 84
2 50 80 92.5 -12.5 80
3 75 90 113. -22.8 90.0
4 100 154 133. 20.9 154
5 125 148 153. -5.36 148
6 150 169 174. -4.64 169
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = resid_newdata, aes(y = Partial.obs)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()fert_grid <- fert_lm1 |>
insight::get_datagrid(at = "FERTILIZER", length = 100)
newdata <- fert_lm1 |>
estimate_expectation(data = fert_grid) |>
as.data.frame()
newdata |> head() FERTILIZER Predicted SE CI_low CI_high
1 25.00000 72.21818 11.16695 46.46715 97.96921
2 27.27273 74.06226 11.00713 48.67977 99.44475
3 29.54545 75.90634 10.84830 50.89012 100.92255
4 31.81818 77.75041 10.69048 53.09812 102.40271
5 34.09091 79.59449 10.53374 55.30365 103.88533
6 36.36364 81.43857 10.37811 57.50661 105.37053
## Now generate partial residuals
resid_grid <- fert |>
insight::get_data()
resid_newdata <- fert_lm1 |>
estimate_expectation(data = resid_grid) |>
as.data.frame() |>
mutate(Partial.obs = Predicted + Residuals)
ggplot(newdata, aes(y = Predicted, x = FERTILIZER)) +
geom_point(data = resid_newdata, aes(y = Partial.obs)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()## Using emmeans
newdata <- fert |>
reframe(nFERTILIZER = seq(min(FERTILIZER), max(FERTILIZER), len = 100),
cFERTILIZER = nFERTILIZER - mean(FERTILIZER)) |>
rename(FERTILIZER = nFERTILIZER)
Xmat <- model.matrix(~cFERTILIZER, data = newdata)
coefs <- coef(fert_lm1)
preds <- coefs %*% t(Xmat)
se <- sqrt(diag(Xmat %*% vcov(fert_lm1) %*% t(Xmat)))
fert_df <- df.residual(fert_lm1)
newdata <- newdata |>
mutate(fit = as.vector(preds),
lower = fit - qt(0.975, df = fert_df) * se,
upper = fit + qt(0.975, df = fert_df) * se)
ggplot(newdata, aes(y = fit, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()## Using emmeans
fert_grid <- with(fert, list(FERTILIZER = seq_range(FERTILIZER, n = 100)))
## OR
fert_grid <- fert |> data_grid(FERTILIZER = seq_range(FERTILIZER, n = 100))
newdata <- fert_glmmTMB |>
emmeans(~FERTILIZER, at = fert_grid) |>
as.data.frame()
newdata <- fert_glmmTMB |> emmeans(~FERTILIZER, at = fert_grid) |> as.data.frame()
newdata |> head() FERTILIZER emmean SE df lower.CL upper.CL
25.00000 72.21817 9.988021 7 48.60025 95.83608
27.27273 74.06224 9.845078 7 50.78233 97.34215
29.54545 75.90632 9.703010 7 52.96235 98.85029
31.81818 77.75040 9.561857 7 55.14020 100.36060
34.09091 79.59448 9.421659 7 57.31579 101.87316
36.36364 81.43855 9.282461 7 59.48902 103.38809
Confidence level used: 0.95
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3)), breaks = c(100, 150, 200, 250)) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()For more complex models that include additional covariates, the partial plot represents the expected trend between the response and focal predictor holding all other predictors constant. This is akin to plotting the trend between the response and focal predictor after standardising first for the other covariate(s).
When this is the case, it is not always appropriate to overlay a partial plot with raw data (as this data has not been standardised and will not necessarily represent the partial effects). Arguably a more appropriate way to represent the data points, is to calculate standardised version of the points by adding the residuals onto the fitted values associated with each of the observed values of the focal predictor. When there is only a single predictor, as is the case here, the result will be identical to just overlaying the raw data.
We can either do this by leveraging completely off the ggemmeans() function (or ggpredict() if we want conditional predictions).
## Using emmeans
fert_grid <- fert |> data_grid(FERTILIZER = seq_range(FERTILIZER, n = 100))
newdata <- fert_glmmTMB |>
emmeans(~FERTILIZER, at = fert_grid) |>
as.data.frame()
newdata <- fert_glmmTMB |> emmeans(~FERTILIZER, at = fert_grid) |> as.data.frame()
newdata |> head() FERTILIZER emmean SE df lower.CL upper.CL
25.00000 72.21817 9.988021 7 48.60025 95.83608
27.27273 74.06224 9.845078 7 50.78233 97.34215
29.54545 75.90632 9.703010 7 52.96235 98.85029
31.81818 77.75040 9.561857 7 55.14020 100.36060
34.09091 79.59448 9.421659 7 57.31579 101.87316
36.36364 81.43855 9.282461 7 59.48902 103.38809
Confidence level used: 0.95
## Now generate partial residuals
fitted_grid <- fert
fit <- fert_glmmTMB |> emmeans(~FERTILIZER, at = fitted_grid) |> as.data.frame() |>
pull(emmean)
resid_newdata <- fert |>
mutate(Fit = fit,
Resid = resid(fert_glmmTMB),
Partial.obs = Fit + Resid)
resid_newdata |> head()# A tibble: 6 × 5
FERTILIZER YIELD Fit Resid Partial.obs
<dbl> <dbl> <dbl> <dbl> <dbl>
1 25 84 72.2 11.8 84
2 50 80 92.5 -12.5 80
3 75 90 113. -22.8 90
4 100 154 133. 20.9 154
5 125 148 153. -5.36 148
6 150 169 174. -4.64 169
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = resid_newdata, aes(y = Partial.obs)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()fert_grid <- fert_glmmTMB |>
insight::get_datagrid(at = "FERTILIZER", length = 100)
newdata <- fert_glmmTMB |>
estimate_expectation(data = fert_grid) |>
as.data.frame()
newdata |> head() FERTILIZER Predicted SE CI_low CI_high
1 25.00000 72.21817 9.988021 52.64200 91.79433
2 27.27273 74.06224 9.845078 54.76625 93.35824
3 29.54545 75.90632 9.703010 56.88877 94.92387
4 31.81818 77.75040 9.561857 59.00950 96.49129
5 34.09091 79.59448 9.421659 61.12836 98.06059
6 36.36364 81.43855 9.282461 63.24526 99.63184
## Now generate partial residuals
resid_grid <- fert |>
insight::get_data()
resid_newdata <- fert_glmmTMB |>
estimate_expectation(data = resid_grid) |>
as.data.frame() |>
mutate(Partial.obs = Predicted + Residuals)
ggplot(newdata, aes(y = Predicted, x = FERTILIZER)) +
geom_point(data = resid_newdata, aes(y = Partial.obs)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()## Using emmeans
newdata <- with(fert, data.frame(FERTILIZER = seq(min(FERTILIZER), max(FERTILIZER), len = 100)))
Xmat <- model.matrix(~FERTILIZER, data = newdata)
coefs <- fixef(fert_glmmTMB)[[1]]
preds <- coefs %*% t(Xmat)
se <- sqrt(diag(Xmat %*% vcov(fert_glmmTMB)[[1]] %*% t(Xmat)))
fert_df <- df.residual(fert_glmmTMB)
newdata <- newdata |>
mutate(fit = as.vector(preds),
lower = fit - qt(0.975, df = fert_df) * se,
upper = fit + qt(0.975, df = fert_df) * se)
ggplot(newdata, aes(y = fit, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3)))+
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()## Using emmeans
fert_grid <- with(fert, list(FERTILIZER = seq_range(FERTILIZER, n = 100)))
## OR
fert_grid <- fert |> data_grid(FERTILIZER = seq_range(FERTILIZER, n = 100))
newdata <- fert_glmmTMB1 |>
emmeans(~FERTILIZER, at = fert_grid) |>
as.data.frame()
newdata <- fert_glmmTMB1 |> emmeans(~FERTILIZER, at = fert_grid) |> as.data.frame()
newdata |> head() FERTILIZER emmean SE df lower.CL upper.CL
25.00000 72.21818 9.988021 7 48.60026 95.83610
27.27273 74.06226 9.845078 7 50.78235 97.34217
29.54545 75.90633 9.703010 7 52.96236 98.85031
31.81818 77.75041 9.561857 7 55.14021 100.36061
34.09091 79.59449 9.421659 7 57.31580 101.87317
36.36364 81.43857 9.282461 7 59.48903 103.38810
Confidence level used: 0.95
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3)), breaks = c(100, 150, 200, 250)) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()For more complex models that include additional covariates, the partial plot represents the expected trend between the response and focal predictor holding all other predictors constant. This is akin to plotting the trend between the response and focal predictor after standardising first for the other covariate(s).
When this is the case, it is not always appropriate to overlay a partial plot with raw data (as this data has not been standardised and will not necessarily represent the partial effects). Arguably a more appropriate way to represent the data points, is to calculate standardised version of the points by adding the residuals onto the fitted values associated with each of the observed values of the focal predictor. When there is only a single predictor, as is the case here, the result will be identical to just overlaying the raw data.
We can either do this by leveraging completely off the ggemmeans() function (or ggpredict() if we want conditional predictions).
## Using emmeans
fert_grid <- fert |> data_grid(FERTILIZER = seq_range(FERTILIZER, n = 100))
newdata <- fert_glmmTMB1 |>
emmeans(~FERTILIZER, at = fert_grid) |>
as.data.frame()
newdata <- fert_glmmTMB1 |> emmeans(~FERTILIZER, at = fert_grid) |> as.data.frame()
newdata |> head() FERTILIZER emmean SE df lower.CL upper.CL
25.00000 72.21818 9.988021 7 48.60026 95.83610
27.27273 74.06226 9.845078 7 50.78235 97.34217
29.54545 75.90633 9.703010 7 52.96236 98.85031
31.81818 77.75041 9.561857 7 55.14021 100.36061
34.09091 79.59449 9.421659 7 57.31580 101.87317
36.36364 81.43857 9.282461 7 59.48903 103.38810
Confidence level used: 0.95
## Now generate partial residuals
fitted_grid <- fert
fit <- fert_glmmTMB1 |> emmeans(~FERTILIZER, at = fitted_grid) |> as.data.frame() |>
pull(emmean)
resid_newdata <- fert |>
mutate(Fit = fit,
Resid = resid(fert_glmmTMB1),
Partial.obs = Fit + Resid)
resid_newdata |> head()# A tibble: 6 × 5
FERTILIZER YIELD Fit Resid Partial.obs
<dbl> <dbl> <dbl> <dbl> <dbl>
1 25 84 72.2 11.8 84
2 50 80 92.5 -12.5 80
3 75 90 113. -22.8 90
4 100 154 133. 20.9 154
5 125 148 153. -5.36 148
6 150 169 174. -4.64 169
ggplot(newdata, aes(y = emmean, x = FERTILIZER)) +
geom_point(data = resid_newdata, aes(y = Partial.obs)) +
geom_ribbon(aes(ymin = lower.CL, ymax = upper.CL), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()fert_grid <- fert_glmmTMB1 |>
insight::get_datagrid(at = "FERTILIZER", length = 100)
newdata <- fert_glmmTMB1 |>
estimate_expectation(data = fert_grid) |>
as.data.frame()
newdata |> head() FERTILIZER Predicted SE CI_low CI_high
1 25.00000 72.21818 9.988021 52.64202 91.79434
2 27.27273 74.06226 9.845078 54.76626 93.35826
3 29.54545 75.90633 9.703010 56.88878 94.92388
4 31.81818 77.75041 9.561857 59.00952 96.49131
5 34.09091 79.59449 9.421659 61.12838 98.06060
6 36.36364 81.43857 9.282461 63.24528 99.63186
## Now generate partial residuals
resid_grid <- fert |>
insight::get_data()
resid_newdata <- fert_glmmTMB1 |>
estimate_expectation(data = resid_grid) |>
as.data.frame() |>
mutate(Partial.obs = Predicted + Residuals)
ggplot(newdata, aes(y = Predicted, x = FERTILIZER)) +
geom_point(data = resid_newdata, aes(y = Partial.obs)) +
geom_ribbon(aes(ymin = CI_low, ymax = CI_high), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3))) +
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()## Using emmeans
newdata <- fert |>
reframe(nFERTILIZER = seq(min(FERTILIZER), max(FERTILIZER), len = 100),
cFERTILIZER = nFERTILIZER - mean(FERTILIZER)) |>
rename(FERTILIZER = nFERTILIZER)
Xmat <- model.matrix(~cFERTILIZER, data = newdata)
coefs <- fixef(fert_glmmTMB1)[[1]]
preds <- coefs %*% t(Xmat)
se <- sqrt(diag(Xmat %*% vcov(fert_glmmTMB1)[[1]] %*% t(Xmat)))
fert_df <- df.residual(fert_glmmTMB1)
newdata <- newdata |>
mutate(fit = as.vector(preds),
lower = fit - qt(0.975, df = fert_df) * se,
upper = fit + qt(0.975, df = fert_df) * se)
ggplot(newdata, aes(y = fit, x = FERTILIZER)) +
geom_point(data = fert, aes(y = YIELD)) +
geom_ribbon(aes(ymin = lower, ymax = upper), fill = "blue", alpha = 0.3) +
geom_line() +
scale_y_continuous(expression(Grass~yield~(g.m^-3)))+
scale_x_continuous(expression(Fertilizer~concentration~(g.ml^-1))) +
theme_classic()12 Reporting
We fitted a linear model (estimated using OLS) to predict YIELD with FERTILIZER
(formula: YIELD ~ FERTILIZER). The model explains a statistically significant
and substantial proportion of variance (R2 = 0.92, F(1, 8) = 94.04, p < .001,
adj. R2 = 0.91). The model's intercept, corresponding to FERTILIZER = 0, is at
51.93 (95% CI [22.00, 81.86], t(8) = 4.00, p = 0.004). Within this model:
- The effect of FERTILIZER is statistically significant and positive (beta =
0.81, 95% CI [0.62, 1.00], t(8) = 9.70, p < .001; Std. beta = 0.96, 95% CI
[0.73, 1.19])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
We fitted a linear model (estimated using OLS) to predict YIELD with FERTILIZER
(formula: YIELD ~ FERTILIZER). The model explains a statistically significant
and substantial proportion of variance (R2 = 0.92, F(1, 8) = 94.04, p < .001,
adj. R2 = 0.91). The model's intercept, corresponding to FERTILIZER = 0, is at
51.93 (95% CI [22.00, 81.86], t(8) = 4.00, p = 0.004). Within this model:
- The effect of FERTILIZER is statistically significant and positive (beta =
0.81, 95% CI [0.62, 1.00], t(8) = 9.70, p < .001; Std. beta = 0.96, 95% CI
[0.73, 1.19])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
- The intercept is statistically significant and positive (beta = 51.93, 95% CI [22.00, 81.86], t(8) = 4.00, p = 0.004; Std. beta = 0.00, 95% CI [-0.22, 0.22])
- The effect of FERTILIZER is statistically significant and positive (beta = 0.81, 95% CI [0.62, 1.00], t(8) = 9.70, p < .001; Std. beta = 0.96, 95% CI [0.73, 1.19])
Analyses were conducted using the R Statistical language (version 4.3.1; R Core
Team, 2023) on ArcoLinux, using the packages effectsize (version 0.8.3;
Ben-Shachar MS et al., 2020), glmmTMB (version 1.1.7; Brooks ME et al., 2017),
effects (version 4.2.2; Fox J, Weisberg S, 2019), car (version 3.1.2; Fox J,
Weisberg S, 2019), carData (version 3.0.5; Fox J et al., 2022), lubridate
(version 1.9.2; Grolemund G, Wickham H, 2011), DHARMa (version 0.4.6; Hartig F,
2022), lindia (version 0.9; Lee Y, Ventura S, 2017), emmeans (version 1.8.7;
Lenth R, 2023), ggeffects (version 1.2.3; Lüdecke D, 2018), sjPlot (version
2.8.14; Lüdecke D, 2023), parameters (version 0.21.1; Lüdecke D et al., 2020),
performance (version 0.10.4; Lüdecke D et al., 2021), easystats (version 0.6.0;
Lüdecke D et al., 2022), see (version 0.7.4; Lüdecke D et al., 2021), insight
(version 0.19.3; Lüdecke D et al., 2019), bayestestR (version 0.13.0; Makowski
D et al., 2019), modelbased (version 0.8.6; Makowski D et al., 2020),
correlation (version 0.8.3; Makowski D et al., 2020), report (version 0.5.7;
Makowski D et al., 2023), tibble (version 3.2.1; Müller K, Wickham H, 2023),
datawizard (version 0.8.0; Patil I et al., 2022), broom (version 1.0.5;
Robinson D et al., 2023), ggfortify (version 0.4.16; Tang Y et al., 2016),
ggplot2 (version 3.4.2; Wickham H, 2016), stringr (version 1.5.0; Wickham H,
2022), forcats (version 1.0.0; Wickham H, 2023), modelr (version 0.1.11;
Wickham H, 2023), tidyverse (version 2.0.0; Wickham H et al., 2019), dplyr
(version 1.1.2; Wickham H et al., 2023), purrr (version 1.0.1; Wickham H, Henry
L, 2023), readr (version 2.1.4; Wickham H et al., 2023), tidyr (version 1.3.0;
Wickham H et al., 2023) and knitr (version 1.43; Xie Y, 2023).
References
----------
- Ben-Shachar MS, Lüdecke D, Makowski D (2020). "effectsize: Estimation of
Effect Size Indices and Standardized Parameters." _Journal of Open Source
Software_, *5*(56), 2815. doi:10.21105/joss.02815
<https://doi.org/10.21105/joss.02815>, <https://doi.org/10.21105/joss.02815>.
- Brooks ME, Kristensen K, van Benthem KJ, Magnusson A, Berg CW, Nielsen A,
Skaug HJ, Maechler M, Bolker BM (2017). "glmmTMB Balances Speed and Flexibility
Among Packages for Zero-inflated Generalized Linear Mixed Modeling." _The R
Journal_, *9*(2), 378-400. doi:10.32614/RJ-2017-066
<https://doi.org/10.32614/RJ-2017-066>.
- Fox J, Weisberg S (2019). _An R Companion to Applied Regression_, 3rd
edition. Sage, Thousand Oaks CA.
<https://socialsciences.mcmaster.ca/jfox/Books/Companion/index.html>. Fox J,
Weisberg S (2018). "Visualizing Fit and Lack of Fit in Complex Regression
Models with Predictor Effect Plots and Partial Residuals." _Journal of
Statistical Software_, *87*(9), 1-27. doi:10.18637/jss.v087.i09
<https://doi.org/10.18637/jss.v087.i09>. Fox J (2003). "Effect Displays in R
for Generalised Linear Models." _Journal of Statistical Software_, *8*(15),
1-27. doi:10.18637/jss.v008.i15 <https://doi.org/10.18637/jss.v008.i15>. Fox J,
Hong J (2009). "Effect Displays in R for Multinomial and Proportional-Odds
Logit Models: Extensions to the effects Package." _Journal of Statistical
Software_, *32*(1), 1-24. doi:10.18637/jss.v032.i01
<https://doi.org/10.18637/jss.v032.i01>.
- Fox J, Weisberg S (2019). _An R Companion to Applied Regression_, Third
edition. Sage, Thousand Oaks CA.
<https://socialsciences.mcmaster.ca/jfox/Books/Companion/>.
- Fox J, Weisberg S, Price B (2022). _carData: Companion to Applied Regression
Data Sets_. R package version 3.0-5,
<https://CRAN.R-project.org/package=carData>.
- Grolemund G, Wickham H (2011). "Dates and Times Made Easy with lubridate."
_Journal of Statistical Software_, *40*(3), 1-25.
<https://www.jstatsoft.org/v40/i03/>.
- Hartig F (2022). _DHARMa: Residual Diagnostics for Hierarchical (Multi-Level
/ Mixed) Regression Models_. R package version 0.4.6,
<https://CRAN.R-project.org/package=DHARMa>.
- Lee Y, Ventura S (2017). _lindia: Automated Linear Regression Diagnostic_. R
package version 0.9, <https://CRAN.R-project.org/package=lindia>.
- Lenth R (2023). _emmeans: Estimated Marginal Means, aka Least-Squares Means_.
R package version 1.8.7, <https://CRAN.R-project.org/package=emmeans>.
- Lüdecke D (2018). "ggeffects: Tidy Data Frames of Marginal Effects from
Regression Models." _Journal of Open Source Software_, *3*(26), 772.
doi:10.21105/joss.00772 <https://doi.org/10.21105/joss.00772>.
- Lüdecke D (2023). _sjPlot: Data Visualization for Statistics in Social
Science_. R package version 2.8.14,
<https://CRAN.R-project.org/package=sjPlot>.
- Lüdecke D, Ben-Shachar M, Patil I, Makowski D (2020). "Extracting, Computing
and Exploring the Parameters of Statistical Models using R." _Journal of Open
Source Software_, *5*(53), 2445. doi:10.21105/joss.02445
<https://doi.org/10.21105/joss.02445>.
- Lüdecke D, Ben-Shachar M, Patil I, Waggoner P, Makowski D (2021).
"performance: An R Package for Assessment, Comparison and Testing of
Statistical Models." _Journal of Open Source Software_, *6*(60), 3139.
doi:10.21105/joss.03139 <https://doi.org/10.21105/joss.03139>.
- Lüdecke D, Ben-Shachar M, Patil I, Wiernik B, Makowski D (2022). "easystats:
Framework for Easy Statistical Modeling, Visualization, and Reporting." _CRAN_.
R package, <https://easystats.github.io/easystats/>.
- Lüdecke D, Patil I, Ben-Shachar M, Wiernik B, Waggoner P, Makowski D (2021).
"see: An R Package for Visualizing Statistical Models." _Journal of Open Source
Software_, *6*(64), 3393. doi:10.21105/joss.03393
<https://doi.org/10.21105/joss.03393>.
- Lüdecke D, Waggoner P, Makowski D (2019). "insight: A Unified Interface to
Access Information from Model Objects in R." _Journal of Open Source Software_,
*4*(38), 1412. doi:10.21105/joss.01412 <https://doi.org/10.21105/joss.01412>.
- Makowski D, Ben-Shachar M, Lüdecke D (2019). "bayestestR: Describing Effects
and their Uncertainty, Existence and Significance within the Bayesian
Framework." _Journal of Open Source Software_, *4*(40), 1541.
doi:10.21105/joss.01541 <https://doi.org/10.21105/joss.01541>,
<https://joss.theoj.org/papers/10.21105/joss.01541>.
- Makowski D, Ben-Shachar M, Patil I, Lüdecke D (2020). "Estimation of
Model-Based Predictions, Contrasts and Means." _CRAN_.
<https://github.com/easystats/modelbased>.
- Makowski D, Ben-Shachar M, Patil I, Lüdecke D (2020). "Methods and Algorithms
for Correlation Analysis in R." _Journal of Open Source Software_, *5*(51),
2306. doi:10.21105/joss.02306 <https://doi.org/10.21105/joss.02306>,
<https://joss.theoj.org/papers/10.21105/joss.02306>.
- Makowski D, Lüdecke D, Patil I, Thériault R, Ben-Shachar M, Wiernik B (2023).
"Automated Results Reporting as a Practical Tool to Improve Reproducibility and
Methodological Best Practices Adoption." _CRAN_.
<https://easystats.github.io/report/>.
- Müller K, Wickham H (2023). _tibble: Simple Data Frames_. R package version
3.2.1, <https://CRAN.R-project.org/package=tibble>.
- Patil I, Makowski D, Ben-Shachar M, Wiernik B, Bacher E, Lüdecke D (2022).
"datawizard: An R Package for Easy Data Preparation and Statistical
Transformations." _Journal of Open Source Software_, *7*(78), 4684.
doi:10.21105/joss.04684 <https://doi.org/10.21105/joss.04684>.
- R Core Team (2023). _R: A Language and Environment for Statistical
Computing_. R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
- Robinson D, Hayes A, Couch S (2023). _broom: Convert Statistical Objects into
Tidy Tibbles_. R package version 1.0.5,
<https://CRAN.R-project.org/package=broom>.
- Tang Y, Horikoshi M, Li W (2016). "ggfortify: Unified Interface to Visualize
Statistical Result of Popular R Packages." _The R Journal_, *8*(2), 474-485.
doi:10.32614/RJ-2016-060 <https://doi.org/10.32614/RJ-2016-060>,
<https://doi.org/10.32614/RJ-2016-060>. Horikoshi M, Tang Y (2018). _ggfortify:
Data Visualization Tools for Statistical Analysis Results_.
<https://CRAN.R-project.org/package=ggfortify>.
- Wickham H (2016). _ggplot2: Elegant Graphics for Data Analysis_.
Springer-Verlag New York. ISBN 978-3-319-24277-4,
<https://ggplot2.tidyverse.org>.
- Wickham H (2022). _stringr: Simple, Consistent Wrappers for Common String
Operations_. R package version 1.5.0,
<https://CRAN.R-project.org/package=stringr>.
- Wickham H (2023). _forcats: Tools for Working with Categorical Variables
(Factors)_. R package version 1.0.0,
<https://CRAN.R-project.org/package=forcats>.
- Wickham H (2023). _modelr: Modelling Functions that Work with the Pipe_. R
package version 0.1.11, <https://CRAN.R-project.org/package=modelr>.
- Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G,
Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K,
Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K,
Yutani H (2019). "Welcome to the tidyverse." _Journal of Open Source Software_,
*4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
- Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A Grammar
of Data Manipulation_. R package version 1.1.2,
<https://CRAN.R-project.org/package=dplyr>.
- Wickham H, Henry L (2023). _purrr: Functional Programming Tools_. R package
version 1.0.1, <https://CRAN.R-project.org/package=purrr>.
- Wickham H, Hester J, Bryan J (2023). _readr: Read Rectangular Text Data_. R
package version 2.1.4, <https://CRAN.R-project.org/package=readr>.
- Wickham H, Vaughan D, Girlich M (2023). _tidyr: Tidy Messy Data_. R package
version 1.3.0, <https://CRAN.R-project.org/package=tidyr>.
- Xie Y (2023). _knitr: A General-Purpose Package for Dynamic Report Generation
in R_. R package version 1.43, <https://yihui.org/knitr/>. Xie Y (2015).
_Dynamic Documents with R and knitr_, 2nd edition. Chapman and Hall/CRC, Boca
Raton, Florida. ISBN 978-1498716963, <https://yihui.org/knitr/>. Xie Y (2014).
"knitr: A Comprehensive Tool for Reproducible Research in R." In Stodden V,
Leisch F, Peng RD (eds.), _Implementing Reproducible Computational Research_.
Chapman and Hall/CRC. ISBN 978-1466561595.
Random effect variances not available. Returned R2 does not account for random effects.
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
Random effect variances not available. Returned R2 does not account for random effects.
We fitted a linear model (estimated using ML and nlminb optimizer) to predict
YIELD with FERTILIZER (formula: YIELD ~ FERTILIZER). The model's explanatory
power related to the fixed effects alone (marginal R2) is 1.00. The model's
intercept, corresponding to FERTILIZER = 0, is at 51.93 (95% CI [29.18, 74.69],
p < .001). Within this model:
- The intercept is statistically significant and positive (beta = 0.81, 95% CI
[0.66, 0.96], p < .001; Std. beta = 6.36e-10, 95% CI [-0.16, 0.16])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation. and We fitted a linear
model (estimated using ML and nlminb optimizer) to predict YIELD with
FERTILIZER (formula: YIELD ~ FERTILIZER). The model's explanatory power related
to the fixed effects alone (marginal R2) is 1.00. The model's intercept,
corresponding to FERTILIZER = 0, is at 16.99 (95% CI [10.96, 26.34]). Within
this model:
- The intercept is statistically significant and positive (beta = 0.81, 95% CI
[0.66, 0.96], p < .001; Std. beta = 6.36e-10, 95% CI [-0.16, 0.16])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.
Random effect variances not available. Returned R2 does not account for random effects.
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
Random effect variances not available. Returned R2 does not account for random effects.
We fitted a linear model (estimated using ML and nlminb optimizer) to predict
YIELD with FERTILIZER (formula: YIELD ~ FERTILIZER). The model's explanatory
power related to the fixed effects alone (marginal R2) is 1.00. The model's
intercept, corresponding to FERTILIZER = 0, is at 51.93 (95% CI [29.18, 74.69],
p < .001). Within this model:
- The intercept is statistically significant and positive (beta = 0.81, 95% CI
[0.66, 0.96], p < .001; Std. beta = 6.36e-10, 95% CI [-0.16, 0.16])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation. and We fitted a linear
model (estimated using ML and nlminb optimizer) to predict YIELD with
FERTILIZER (formula: YIELD ~ FERTILIZER). The model's explanatory power related
to the fixed effects alone (marginal R2) is 1.00. The model's intercept,
corresponding to FERTILIZER = 0, is at 16.99 (95% CI [10.96, 26.34]). Within
this model:
- The intercept is statistically significant and positive (beta = 0.81, 95% CI
[0.66, 0.96], p < .001; Std. beta = 6.36e-10, 95% CI [-0.16, 0.16])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald z-distribution approximation.
Random effect variances not available. Returned R2 does not account for random effects.
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
- The intercept is statistically significant and positive (beta = 51.93, 95% CI [29.18, 74.69], p < .001; Std. beta = 6.36e-10, 95% CI [-0.16, 0.16])
- The effect of FERTILIZER is statistically NA and positive (beta = 16.99, 95% CI [10.96, 26.34]; Std. beta = 0.96, 95% CI [0.79, 1.13])
- The intercept is statistically significant and positive (beta = 0.81, 95% CI [0.66, 0.96], p < .001; Std. beta = 6.36e-10, 95% CI [-0.16, 0.16])
Random effect variances not available. Returned R2 does not account for random effects.
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
beta = 51.93, 95% CI [29.18, 74.69], p < .001; Std. beta = 6.36e-10, 95% CI [-0.16, 0.16]
beta = 16.99, 95% CI [10.96, 26.34]; Std. beta = 0.96, 95% CI [0.79, 1.13]
beta = 0.81, 95% CI [0.66, 0.96], p < .001; Std. beta = 6.36e-10, 95% CI [-0.16, 0.16]
Random effect variances not available. Returned R2 does not account for random effects.
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
Warning in text == "" | text2 == "": longer object length is not a multiple of
shorter object length
Standardized parameters were obtained by fitting the model on a standardized version of the dataset. 95% Confidence Intervals (CIs) and p-values were computed using a Wald z-distribution approximation.
Analyses were conducted using the R Statistical language (version 4.3.1; R Core
Team, 2023) on ArcoLinux, using the packages effectsize (version 0.8.3;
Ben-Shachar MS et al., 2020), glmmTMB (version 1.1.7; Brooks ME et al., 2017),
effects (version 4.2.2; Fox J, Weisberg S, 2019), car (version 3.1.2; Fox J,
Weisberg S, 2019), carData (version 3.0.5; Fox J et al., 2022), lubridate
(version 1.9.2; Grolemund G, Wickham H, 2011), DHARMa (version 0.4.6; Hartig F,
2022), lindia (version 0.9; Lee Y, Ventura S, 2017), emmeans (version 1.8.7;
Lenth R, 2023), ggeffects (version 1.2.3; Lüdecke D, 2018), sjPlot (version
2.8.14; Lüdecke D, 2023), parameters (version 0.21.1; Lüdecke D et al., 2020),
performance (version 0.10.4; Lüdecke D et al., 2021), easystats (version 0.6.0;
Lüdecke D et al., 2022), see (version 0.7.4; Lüdecke D et al., 2021), insight
(version 0.19.3; Lüdecke D et al., 2019), bayestestR (version 0.13.0; Makowski
D et al., 2019), modelbased (version 0.8.6; Makowski D et al., 2020),
correlation (version 0.8.3; Makowski D et al., 2020), report (version 0.5.7;
Makowski D et al., 2023), tibble (version 3.2.1; Müller K, Wickham H, 2023),
datawizard (version 0.8.0; Patil I et al., 2022), broom (version 1.0.5;
Robinson D et al., 2023), ggfortify (version 0.4.16; Tang Y et al., 2016),
ggplot2 (version 3.4.2; Wickham H, 2016), stringr (version 1.5.0; Wickham H,
2022), forcats (version 1.0.0; Wickham H, 2023), modelr (version 0.1.11;
Wickham H, 2023), tidyverse (version 2.0.0; Wickham H et al., 2019), dplyr
(version 1.1.2; Wickham H et al., 2023), purrr (version 1.0.1; Wickham H, Henry
L, 2023), readr (version 2.1.4; Wickham H et al., 2023), tidyr (version 1.3.0;
Wickham H et al., 2023) and knitr (version 1.43; Xie Y, 2023).
References
----------
- Ben-Shachar MS, Lüdecke D, Makowski D (2020). "effectsize: Estimation of
Effect Size Indices and Standardized Parameters." _Journal of Open Source
Software_, *5*(56), 2815. doi:10.21105/joss.02815
<https://doi.org/10.21105/joss.02815>, <https://doi.org/10.21105/joss.02815>.
- Brooks ME, Kristensen K, van Benthem KJ, Magnusson A, Berg CW, Nielsen A,
Skaug HJ, Maechler M, Bolker BM (2017). "glmmTMB Balances Speed and Flexibility
Among Packages for Zero-inflated Generalized Linear Mixed Modeling." _The R
Journal_, *9*(2), 378-400. doi:10.32614/RJ-2017-066
<https://doi.org/10.32614/RJ-2017-066>.
- Fox J, Weisberg S (2019). _An R Companion to Applied Regression_, 3rd
edition. Sage, Thousand Oaks CA.
<https://socialsciences.mcmaster.ca/jfox/Books/Companion/index.html>. Fox J,
Weisberg S (2018). "Visualizing Fit and Lack of Fit in Complex Regression
Models with Predictor Effect Plots and Partial Residuals." _Journal of
Statistical Software_, *87*(9), 1-27. doi:10.18637/jss.v087.i09
<https://doi.org/10.18637/jss.v087.i09>. Fox J (2003). "Effect Displays in R
for Generalised Linear Models." _Journal of Statistical Software_, *8*(15),
1-27. doi:10.18637/jss.v008.i15 <https://doi.org/10.18637/jss.v008.i15>. Fox J,
Hong J (2009). "Effect Displays in R for Multinomial and Proportional-Odds
Logit Models: Extensions to the effects Package." _Journal of Statistical
Software_, *32*(1), 1-24. doi:10.18637/jss.v032.i01
<https://doi.org/10.18637/jss.v032.i01>.
- Fox J, Weisberg S (2019). _An R Companion to Applied Regression_, Third
edition. Sage, Thousand Oaks CA.
<https://socialsciences.mcmaster.ca/jfox/Books/Companion/>.
- Fox J, Weisberg S, Price B (2022). _carData: Companion to Applied Regression
Data Sets_. R package version 3.0-5,
<https://CRAN.R-project.org/package=carData>.
- Grolemund G, Wickham H (2011). "Dates and Times Made Easy with lubridate."
_Journal of Statistical Software_, *40*(3), 1-25.
<https://www.jstatsoft.org/v40/i03/>.
- Hartig F (2022). _DHARMa: Residual Diagnostics for Hierarchical (Multi-Level
/ Mixed) Regression Models_. R package version 0.4.6,
<https://CRAN.R-project.org/package=DHARMa>.
- Lee Y, Ventura S (2017). _lindia: Automated Linear Regression Diagnostic_. R
package version 0.9, <https://CRAN.R-project.org/package=lindia>.
- Lenth R (2023). _emmeans: Estimated Marginal Means, aka Least-Squares Means_.
R package version 1.8.7, <https://CRAN.R-project.org/package=emmeans>.
- Lüdecke D (2018). "ggeffects: Tidy Data Frames of Marginal Effects from
Regression Models." _Journal of Open Source Software_, *3*(26), 772.
doi:10.21105/joss.00772 <https://doi.org/10.21105/joss.00772>.
- Lüdecke D (2023). _sjPlot: Data Visualization for Statistics in Social
Science_. R package version 2.8.14,
<https://CRAN.R-project.org/package=sjPlot>.
- Lüdecke D, Ben-Shachar M, Patil I, Makowski D (2020). "Extracting, Computing
and Exploring the Parameters of Statistical Models using R." _Journal of Open
Source Software_, *5*(53), 2445. doi:10.21105/joss.02445
<https://doi.org/10.21105/joss.02445>.
- Lüdecke D, Ben-Shachar M, Patil I, Waggoner P, Makowski D (2021).
"performance: An R Package for Assessment, Comparison and Testing of
Statistical Models." _Journal of Open Source Software_, *6*(60), 3139.
doi:10.21105/joss.03139 <https://doi.org/10.21105/joss.03139>.
- Lüdecke D, Ben-Shachar M, Patil I, Wiernik B, Makowski D (2022). "easystats:
Framework for Easy Statistical Modeling, Visualization, and Reporting." _CRAN_.
R package, <https://easystats.github.io/easystats/>.
- Lüdecke D, Patil I, Ben-Shachar M, Wiernik B, Waggoner P, Makowski D (2021).
"see: An R Package for Visualizing Statistical Models." _Journal of Open Source
Software_, *6*(64), 3393. doi:10.21105/joss.03393
<https://doi.org/10.21105/joss.03393>.
- Lüdecke D, Waggoner P, Makowski D (2019). "insight: A Unified Interface to
Access Information from Model Objects in R." _Journal of Open Source Software_,
*4*(38), 1412. doi:10.21105/joss.01412 <https://doi.org/10.21105/joss.01412>.
- Makowski D, Ben-Shachar M, Lüdecke D (2019). "bayestestR: Describing Effects
and their Uncertainty, Existence and Significance within the Bayesian
Framework." _Journal of Open Source Software_, *4*(40), 1541.
doi:10.21105/joss.01541 <https://doi.org/10.21105/joss.01541>,
<https://joss.theoj.org/papers/10.21105/joss.01541>.
- Makowski D, Ben-Shachar M, Patil I, Lüdecke D (2020). "Estimation of
Model-Based Predictions, Contrasts and Means." _CRAN_.
<https://github.com/easystats/modelbased>.
- Makowski D, Ben-Shachar M, Patil I, Lüdecke D (2020). "Methods and Algorithms
for Correlation Analysis in R." _Journal of Open Source Software_, *5*(51),
2306. doi:10.21105/joss.02306 <https://doi.org/10.21105/joss.02306>,
<https://joss.theoj.org/papers/10.21105/joss.02306>.
- Makowski D, Lüdecke D, Patil I, Thériault R, Ben-Shachar M, Wiernik B (2023).
"Automated Results Reporting as a Practical Tool to Improve Reproducibility and
Methodological Best Practices Adoption." _CRAN_.
<https://easystats.github.io/report/>.
- Müller K, Wickham H (2023). _tibble: Simple Data Frames_. R package version
3.2.1, <https://CRAN.R-project.org/package=tibble>.
- Patil I, Makowski D, Ben-Shachar M, Wiernik B, Bacher E, Lüdecke D (2022).
"datawizard: An R Package for Easy Data Preparation and Statistical
Transformations." _Journal of Open Source Software_, *7*(78), 4684.
doi:10.21105/joss.04684 <https://doi.org/10.21105/joss.04684>.
- R Core Team (2023). _R: A Language and Environment for Statistical
Computing_. R Foundation for Statistical Computing, Vienna, Austria.
<https://www.R-project.org/>.
- Robinson D, Hayes A, Couch S (2023). _broom: Convert Statistical Objects into
Tidy Tibbles_. R package version 1.0.5,
<https://CRAN.R-project.org/package=broom>.
- Tang Y, Horikoshi M, Li W (2016). "ggfortify: Unified Interface to Visualize
Statistical Result of Popular R Packages." _The R Journal_, *8*(2), 474-485.
doi:10.32614/RJ-2016-060 <https://doi.org/10.32614/RJ-2016-060>,
<https://doi.org/10.32614/RJ-2016-060>. Horikoshi M, Tang Y (2018). _ggfortify:
Data Visualization Tools for Statistical Analysis Results_.
<https://CRAN.R-project.org/package=ggfortify>.
- Wickham H (2016). _ggplot2: Elegant Graphics for Data Analysis_.
Springer-Verlag New York. ISBN 978-3-319-24277-4,
<https://ggplot2.tidyverse.org>.
- Wickham H (2022). _stringr: Simple, Consistent Wrappers for Common String
Operations_. R package version 1.5.0,
<https://CRAN.R-project.org/package=stringr>.
- Wickham H (2023). _forcats: Tools for Working with Categorical Variables
(Factors)_. R package version 1.0.0,
<https://CRAN.R-project.org/package=forcats>.
- Wickham H (2023). _modelr: Modelling Functions that Work with the Pipe_. R
package version 0.1.11, <https://CRAN.R-project.org/package=modelr>.
- Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G,
Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K,
Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K,
Yutani H (2019). "Welcome to the tidyverse." _Journal of Open Source Software_,
*4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
- Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A Grammar
of Data Manipulation_. R package version 1.1.2,
<https://CRAN.R-project.org/package=dplyr>.
- Wickham H, Henry L (2023). _purrr: Functional Programming Tools_. R package
version 1.0.1, <https://CRAN.R-project.org/package=purrr>.
- Wickham H, Hester J, Bryan J (2023). _readr: Read Rectangular Text Data_. R
package version 2.1.4, <https://CRAN.R-project.org/package=readr>.
- Wickham H, Vaughan D, Girlich M (2023). _tidyr: Tidy Messy Data_. R package
version 1.3.0, <https://CRAN.R-project.org/package=tidyr>.
- Xie Y (2023). _knitr: A General-Purpose Package for Dynamic Report Generation
in R_. R package version 1.43, <https://yihui.org/knitr/>. Xie Y (2015).
_Dynamic Documents with R and knitr_, 2nd edition. Chapman and Hall/CRC, Boca
Raton, Florida. ISBN 978-1498716963, <https://yihui.org/knitr/>. Xie Y (2014).
"knitr: A Comprehensive Tool for Reproducible Research in R." In Stodden V,
Leisch F, Peng RD (eds.), _Implementing Reproducible Computational Research_.
Chapman and Hall/CRC. ISBN 978-1466561595.